You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org> on 2016/06/14 23:51:57 UTC

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Bikramjeet Vig has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3385

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................

IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following query format:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
A tests/metadata/test_refresh_partition.py
10 files changed, 705 insertions(+), 258 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/85/3385/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Hello David Knupp, Henry Robinson, Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3385

to look at the new patch set (#13).

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................

IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following syntax:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Performance:
For a table with 10000 partitions and 1 file per partition

                     execResetMetadata()       Total Execution Time

Refresh Table              3795 ms                   4630 ms

Refersh Partition            42 ms                    680 ms

We see that the time to refresh improves by a factor of 90x but due to
significant overhead of about 640ms in this case the effective improvement
is about 7x. As the size of the table and number of partitions increase,
this improvement would be more significant.

Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
A tests/metadata/test_refresh_partition.py
11 files changed, 406 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/85/3385/13
-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 13
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 1:

I believe your editor is doing some formatting and gerrit marks it as changed lines. It is confusing and not helpful for anyone who wants to review this code (lots of noise). Can you fix that and resubmit this patch so that only the things that changed in this patch are marked as such?

-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has abandoned this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Abandoned

Moved to ASF

-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 13
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 11:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/3385/11//COMMIT_MSG
Commit Message:

PS11, Line 32: .
,


PS11, Line 33: This
this (part of previous sentence).


http://gerrit.cloudera.org:8080/#/c/3385/11/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 1178:  throws CatalogException {
indentation is wrong here


http://gerrit.cloudera.org:8080/#/c/3385/11/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

Line 1943:    * 'hmsPartition'.
Comment what happens if oldPartition is null (the partition is added, with nothing removed)


http://gerrit.cloudera.org:8080/#/c/3385/11/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

Line 1: # Copyright (c) 2015 Cloudera, Inc. All rights reserved.
Remove this line - copyright belongs to ASF now.


PS11, Line 56: rows.pop()
tiny cleanup: just do rows[:-1] on line 61


Line 83:         'alter table %s add partition (y = 333, z = 5309)' % table_name)
choose a consistent style for writing partition specs. spaces around '=' or not? only before or after? choose one (my preference for no spaces).


Line 198:     self.client.execute("alter table %s add partition (year=2010, month =1)" %
month=1


Line 258:     self.client.execute("refresh %s partition (year=2010, month =2)" % table_name)
month=2


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 11
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 10: Code-Review+1

(18 comments)

Carrying forward David's +1

http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
File fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java:

Line 393:     // except for DESCRIBE TABLE or REFRESH/INVALIDATE statements
> Do we really register column-level auth requests for refresh/invalidate sta
Since we require "ANY" privilege for refresh, the partition analyzer adds an "ANY" column privilege request. This 'if' block enforces the condition that if an AuthorizeableColumn request exists, it should be preceded by a table level "SELECT" request (See Line 492), but since we only have another table level "ANY" request, this path will throw an error.


Line 439:             analysisResult_.isResetMetadataStmt());
> Why do we need this? My understanding is that we'll never register column-l
As explained above, we do require a column level privilege and hence this condition is required


http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java:

Line 36:   // not null when refreshing a single partition specified by this partition
> not null when refreshing a single partition.
Done


Line 53:   public void analyze(Analyzer analyzer) throws AnalysisException {
> This seems to allow "invalidate metadata <part_spec>" as well. I think its 
This case will be caught by the parser but I'll still put a precondition for that and I have added some test cases in parser test as well.


http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 1175:    */
> Can you explain a little on the return 'tbl' of this method?
Done


Line 1182:       HdfsTable hdfsTable = (HdfsTable) tbl;
> can you use hdfsTable consistently instead of tbl below for clarity? for ex
Done


Line 1183:       HdfsPartition hdfspartition = hdfsTable
> hdfsPartition
Done


Line 1195:         TTableName tblName = new TTableName(tbl.getDb().getName().toLowerCase(),
> I don't think you really need this if you take care of my other comments he
Done


Line 1200:               db.getName(), tblName.getTable_name(), partitionName);
> use tbl.getDb().getName(), tbl.getName()
Done


Line 1211:               "Error loading metadata for partition: " + db.getName() + "."
> usu tbl.getFullName()
Done


http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

Line 1936: 
> no need for newline
Done


http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

Line 2752:           if (req.isSetPartition_spec()) {
> Same Preconditions check as my comment in Analyzer (to prevent "invalidate 
The 'if' statements preceding this make sure that execution will only reach this point if its a refresh statement and we have a valid table name and therefore a non null table object.
Also, in case we have a malformed partition_spec then we wont get a valid partition object and the hive rpc will throw an exception. this will make sure nothing breaks


http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java:

Line 667:     AnalysisError(
> add test where you try to refresh a partition of an unpartitioned table
Done


http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java:

Line 785:     //Positive cases for checking refresh partition
> nit: space before "Positive"
Done


Line 786:     AuthzOk("refresh functional.alltypesagg partition (year=2010, month =1, day = 1)");
> nit: use consistent style with partition key values, e.g. year=2010 (no spa
Done


Line 813:     AuthzError(
> also add a test where you try to refresh a non-existent partition of a tabl
Done


http://gerrit.cloudera.org:8080/#/c/3385/10/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

Line 79:         'create table if not exists %s (x int) partitioned by (y int, z int)' %
> Remove the "IF NOT EXISTS" part (and elsewhere in this file).
Done


Line 230:       CREATE TABLE %s LIKE functional.alltypes STORED AS PARQUET
> nit: why keyword in caps here and everywhere else lowercase?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3385/9/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

PS9, Line 27: class ImpalaTableWrapper(object):
> So, if we were to make something like this class generally available, it sh
My take is that while this could be useful for some test code, it's a bit superfluous here for these reasons:

1. Tables created with ImpalaTableWrapper are using unique_database in all instances. unique_database will handle the "unique namespace" for the created table, and unique_database will DROP CASCADE, which handles cleanup, even during exception, thanks to the fixture's finalizer.

2. Only 1 table is being created for each test anyway. Since they are using unique_database already, they have exclusion from each other already.

3. The context usages of ImpalaTableWrapper are for the full durations of tests. I.e., the table is created at the very beginning of the test proper and is expected to persist until the end. In that sense, it "bumpers" unique_database.

If instead a test needed to manage many tables at a time, or deal with separate contexts of tables' existing, then I could see a stronger reason to have this.

Do you anticipate such tests?


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 12: Code-Review+2

(9 comments)

Carrying forward Henry's +2.

http://gerrit.cloudera.org:8080/#/c/3385/11//COMMIT_MSG
Commit Message:

PS11, Line 32: .
> ,
Done


PS11, Line 33: This
> this (part of previous sentence).
Done


http://gerrit.cloudera.org:8080/#/c/3385/11/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 1178:  throws CatalogException {
> indentation is wrong here
Done


http://gerrit.cloudera.org:8080/#/c/3385/11/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

Line 1943:    * 'hmsPartition'.
> Comment what happens if oldPartition is null (the partition is added, with 
Done


http://gerrit.cloudera.org:8080/#/c/3385/11/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

Line 1: # Copyright (c) 2015 Cloudera, Inc. All rights reserved.
> Remove this line - copyright belongs to ASF now.
Done


PS11, Line 56: rows.pop()
> tiny cleanup: just do rows[:-1] on line 61
Done


Line 83:         'alter table %s add partition (y = 333, z = 5309)' % table_name)
> choose a consistent style for writing partition specs. spaces around '=' or
Done


Line 198:     self.client.execute("alter table %s add partition (year=2010, month =1)" %
> month=1
Done


Line 258:     self.client.execute("refresh %s partition (year=2010, month =2)" % table_name)
> month=2
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 12
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3385/2/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

PS2, Line 48: ImpalaDbWrapper
This is a nicely-implemented context manager for setting up and tearing down databases for your tests, but it appears to duplicate functionality that is already provided by the unique_database pytest fixture (which you also use.)

It would be better to stick with just one method for setting up databases, and I think the preferred choice would be to use the fixture.


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 5:

(12 comments)

Did you forget to add the analysis tests? (see my comment on the previous patch)

http://gerrit.cloudera.org:8080/#/c/3385/5/fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java:

PS5, Line 42: Preconditions.checkArgument(!isRefresh || name != null);
            :     Preconditions.checkArgument(partitionSpec == null || name != null);
Can't we combine these two into a single check?


http://gerrit.cloudera.org:8080/#/c/3385/5/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS5, Line 1164: To achieve that, it drops the partition
              :    * from the table object, fetches a fresh copy of the partition from Hive
              :    * metastore and adds it back to the table.
We no longer do that in this function, so you may want to update the comment.


PS5, Line 1174: Db db = tbl.getDb();
This seems to be used first time in L1195. Let's move it closer to that.


PS5, Line 1175: HdfsPartition refreshedPartition = null;
Where is this variable used?


PS5, Line 1182: // For the case where partition does not exist in Catalog cache but might
              :       // exist in Hive Metastore
I am not sure I follow the comment. I believe it's more accurate to say that you either retrieve the partition name from an existing partition or you construct it from the partition spec.


PS5, Line 1185: getPartitionNameFromThriftPartitionSpec(
I think it's a bit more clear to call this function constructPartitionName(). The partition spec is the input param so it's kind of obvious how the partition name gets constructed.


PS5, Line 1187: %s",
              :           tbl.getFullName() + " " + partitionName));
"%s %s", tbl.getFullName(), partitionName);


PS5, Line 1189: long newCatalogVersion = incrementAndGetCatalogVersion();
              :       catalogLock_.writeLock().unlock();
Move this above L1180. We don't want to hold the catalog lock longer than necessary.


Line 1193:         org.apache.hadoop.hive.metastore.api.Partition partition = null;
I would add a comment here saying: "If the partition does not exist in Hive MetaStore, remove it from the catalog."


PS5, Line 1206:     + tblName.getTable_name() + " " + partitionName,
              :               e);
nit: merge these two lines


http://gerrit.cloudera.org:8080/#/c/3385/5/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

PS5, Line 1944: data
metadata (files and blocks)


PS5, Line 1948: dropPartition(partitionSpec);
We have a dropPartition function that takes as input a Partition object, so I don't think the partitionSpec parameter is needed here.


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/3385/6/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

PS6, Line 16: import logging
            : import pytest
Removed unused imports


PS6, Line 19: import shlex
Unused import


PS6, Line 21: import subprocess
Unused import


PS6, Line 23: import *
"import *" is generally regarded as bad practice in python. It's better to import symbols by name.


PS6, Line 52:   class ImpalaTableWrapper(object):
Sorry -- I didn't notice before that this class is nested within the outer test class. I'm not sure that's really necessary here. Unless there's a good reason to do, can you move this class so that it's scoped at the same level as your TestRefreshPartition class?


Line 82:     for row in rows:
This is perfectly well formed, although it would have been also possible to write it as a python one liner using a list comprehension:

  return [row.split('\t')[0:-8] for row in rows]

That's a very common pattern.

But that said, since can't tell what single row looks like, the slice [0:-8] looks odd to me. How long is the original row? How many tokens after splitting on '\t'? What are the 8 elements you're chopping off? How many elements does that leave remaining?

In other words, I know that each "name" is a list of strings, but I can't tell from reading this code how long the list is, or what each element in the resulting list "name" is. I just know that name is eight elements shorter than row.split('\t').


PS6, Line 99: range(0, 16)
You actually don't need the 0, and in python 2.x, it's more customary to use xrange.

   ... for i in xrange(16)


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 5:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/3385/5/fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java:

PS5, Line 42: Preconditions.checkArgument(!isRefresh || name != null);
            :     Preconditions.checkArgument(partitionSpec == null || name != null);
> Can't we combine these two into a single check?
Done


http://gerrit.cloudera.org:8080/#/c/3385/5/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS5, Line 1164: To achieve that, it drops the partition
              :    * from the table object, fetches a fresh copy of the partition from Hive
              :    * metastore and adds it back to the table.
> We no longer do that in this function, so you may want to update the commen
Done


PS5, Line 1174: Db db = tbl.getDb();
> This seems to be used first time in L1195. Let's move it closer to that.
These 3 variables can be moved to L1195 but I kept them here because I wanted to minimize amount of code inside critical sections(while holding locks)


PS5, Line 1175: HdfsPartition refreshedPartition = null;
> Where is this variable used?
Done


PS5, Line 1182: // For the case where partition does not exist in Catalog cache but might
              :       // exist in Hive Metastore
> I am not sure I follow the comment. I believe it's more accurate to say tha
Done


PS5, Line 1185: getPartitionNameFromThriftPartitionSpec(
> I think it's a bit more clear to call this function constructPartitionName(
Done


PS5, Line 1187: %s",
              :           tbl.getFullName() + " " + partitionName));
> "%s %s", tbl.getFullName(), partitionName);
Done


PS5, Line 1189: long newCatalogVersion = incrementAndGetCatalogVersion();
              :       catalogLock_.writeLock().unlock();
> Move this above L1180. We don't want to hold the catalog lock longer than n
Done


Line 1193:         org.apache.hadoop.hive.metastore.api.Partition partition = null;
> I would add a comment here saying: "If the partition does not exist in Hive
Done


PS5, Line 1206:     + tblName.getTable_name() + " " + partitionName,
              :               e);
> nit: merge these two lines
Done


http://gerrit.cloudera.org:8080/#/c/3385/5/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

PS5, Line 1944: data
> metadata (files and blocks)
Done


PS5, Line 1948: dropPartition(partitionSpec);
> We have a dropPartition function that takes as input a Partition object, so
the dropPartition function takes in a HdfsPartition type object but a "org.apache.hadoop.hive.metastore.api.Partition"
type object is being passed to this method. The only option was to use the dropPartition method that took in the partitionSpec object. If I do not pass the partition spec object, I ll have to write a helper method to create that object from the Partition object.

Also, the dropPartition method takes in the HdfsPartition that is already in catalog to remove the stats and block data. If I pass it the new HdfsPartition object that I created, that will mess up all the metadata.


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Hello David Knupp, Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3385

to look at the new patch set (#11).

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................

IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following syntax:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Performance:
For a table with 10000 partitions and 1 file per partition

                     execResetMetadata()       Total Execution Time

Refresh Table              3795 ms                   4630 ms

Refersh Partition            42 ms                    680 ms

We see that the time to refresh improves by a factor of 90x but due to
significant overhead of about 640ms in this case the effective improvement
is about 7x. As the size of the table and number of partitions increase.
This improvement would be more significant.

Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
A tests/metadata/test_refresh_partition.py
11 files changed, 408 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/85/3385/11
-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 11
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 10:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
File fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java:

Line 393:     // except for DESCRIBE TABLE or REFRESH/INVALIDATE statements
Do we really register column-level auth requests for refresh/invalidate statements?


Line 439:             analysisResult_.isResetMetadataStmt());
Why do we need this? My understanding is that we'll never register column-level auth requests for refresh/invalidate statements, so the first condition in this Preconditions check should be sufficient.


http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java:

Line 36:   // not null when refreshing a single partition specified by this partition
not null when refreshing a single partition.

(shorter == better)


http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 1182:       HdfsTable hdfsTable = (HdfsTable) tbl;
can you use hdfsTable consistently instead of tbl below for clarity? for example, see L1205-1206 which might seem confusing otherwise


Line 1183:       HdfsPartition hdfspartition = hdfsTable
hdfsPartition


Line 1195:         TTableName tblName = new TTableName(tbl.getDb().getName().toLowerCase(),
I don't think you really need this if you take care of my other comments here.


Line 1200:               db.getName(), tblName.getTable_name(), partitionName);
use tbl.getDb().getName(), tbl.getName()


Line 1211:               "Error loading metadata for partition: " + db.getName() + "."
usu tbl.getFullName()


http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

Line 1936: 
no need for newline


http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java:

Line 667:     AnalysisError(
add test where you try to refresh a partition of an unpartitioned table


http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java:

Line 785:     //Positive cases for checking refresh partition
nit: space before "Positive"


Line 786:     AuthzOk("refresh functional.alltypesagg partition (year=2010, month =1, day = 1)");
nit: use consistent style with partition key values, e.g. year=2010 (no spaces between <col>=<value>

fix elsewhere also


Line 813:     AuthzError(
also add a test where you try to refresh a non-existent partition of a table that the user does not have access to

the test is to make sure that an auth error is reported, and not an analysis error because that would reveal its non-existence to an unprivileged user


http://gerrit.cloudera.org:8080/#/c/3385/10/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

Line 79:         'create table if not exists %s (x int) partitioned by (y int, z int)' %
Remove the "IF NOT EXISTS" part (and elsewhere in this file).

We expect the table not to exist, and if something goes wrong it may be more difficult to debug because we silently ignore that failure (table already exists)


Line 230:       CREATE TABLE %s LIKE functional.alltypes STORED AS PARQUET
nit: why keyword in caps here and everywhere else lowercase?


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has uploaded a new patch set (#7).

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................

IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following syntax:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
A tests/metadata/test_refresh_partition.py
11 files changed, 440 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/85/3385/7
-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java:

Line 53:   public void analyze(Analyzer analyzer) throws AnalysisException {
This seems to allow "invalidate metadata <part_spec>" as well. I think its better to add a preconditions check / throw Analysis Exception in such a case.


http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 1175:    */
Can you explain a little on the return 'tbl' of this method?


Line 1180:       long newCatalogVersion = incrementAndGetCatalogVersion();
- I'm not sure on this one but incrementAndGetCatalogVersion() seems to be internally taking catalogLock_.writeLock().lock(). Given you are taking it at L1178 seems to be deadlock. Interestingly there are multiple other places in the code that are doing this. Even there are callers that don't take this lock. So its inconsistent and not related to your patch. 

- Dimitris/Alex: Can you please take a look at this once.


http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

Line 2752:           if (req.isSetPartition_spec()) {
Same Preconditions check as my comment in Analyzer (to prevent "invalidate metadata <part_spec>". Reason being we have clients like BDR that uses this Catalog API rather than going via frontend Analyzer.


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 3:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/3385/3//COMMIT_MSG
Commit Message:

PS3, Line 13: query format
> syntax
Done


http://gerrit.cloudera.org:8080/#/c/3385/3/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

PS3, Line 190: 5: optional list<CatalogObjects.TPartitionKeyValue> partition_spec
> Can you plz add a comment?
Done


http://gerrit.cloudera.org:8080/#/c/3385/3/fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java:

PS3, Line 35: //null if refreshing the whole table or invalidating the entire catalog.
> Let's reverse this comment :) "not-null when ...."
Done


PS3, Line 41: Preconditions.checkArgument(partitionSpec == null
            :         || (partitionSpec != null && name != null && !name.isEmpty()));
> isn't this equivalent to partitionSpec == null || name != null? I believe t
Done


PS3, Line 46: if (partitionSpec_ != null) {
            :       partitionSpec_.setTableName(tableName_);
            :     }
> single line?
Done


http://gerrit.cloudera.org:8080/#/c/3385/3/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS3, Line 1175: HdfsPartition hdfspartition = hdfsTable
              :         .getPartitionFromThriftPartitionSpec(partitionSpec);
              :     // For the case where partition does not exist in Catalog cache but might
              :     // exist in Hive Metastore
              :     String partitionName = hdfspartition == null
              :         ? hdfsTable.getPartitionNameFromThriftPartitionSpec(partitionSpec)
              :         : hdfspartition.getPartitionName();
> You seem to be accessing partition info from a table without protecting it 
Done


PS3, Line 1200: dropPartition(tbl, partitionSpec);
> Why are you calling the dropPartition from this class and not the HdfsTable
Done


PS3, Line 1208: dropPartition(tbl, partitionSpec);
              :         refreshedPartition = hdfsTable.createPartition(partition.getSd(), partition);
              :         addPartition(refreshedPartition);
> You're exposing the process of refreshing a table partition to the reader o
Done


http://gerrit.cloudera.org:8080/#/c/3385/3/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

PS3, Line 1921: this.
> Remove 'this' from here and everywhere else, it's not needed.
Done


PS3, Line 1927: kv.getValue() == null || kv.getValue().isEmpty()
> You can probably use Guava's Strings.isNullOrEmpty(kv.getValue()) here
Done


PS3, Line 1933: if(partitionCols.size()!=partitionVals.size()) return null;
> Are you sure this is possible? Won't we catch it during the analysis of the
Done


Line 1936:         partitionCols,partitionVals);
> nit: space after ','
Done


http://gerrit.cloudera.org:8080/#/c/3385/2/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

PS2, Line 48: ImpalaDbWrapper
> This is a nicely-implemented context manager for setting up and tearing dow
Done


http://gerrit.cloudera.org:8080/#/c/3385/3/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

PS3, Line 33: class TestRefreshPartition(ImpalaTestSuite):
> Comment what this class is testing
Done


Line 104: 
> nit: remove empty line
Done


PS3, Line 116: """
             :     Partitions added in Hive can be viewed in Impala after refreshing
             :     partition.
             :     """
> nit: can you keep the comment formatting consistent (e.g. """blah wwfwfw"""
Done


PS3, Line 137: Partitions
> "Partition"
Done


PS3, Line 158: test_add_data_and_refresh
> Another interesting test is to create 2 partitions, insert data through hiv
Done


Line 267:     assert result.data == [str(file_num_rows)]
> You can also continue that test by removing the data file, calling refresh 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 3:

(18 comments)

You also need to add some analysis tests in AnalyzerTest.java (fn TestResetMetadata()).

http://gerrit.cloudera.org:8080/#/c/3385/3//COMMIT_MSG
Commit Message:

PS3, Line 13: query format
syntax


http://gerrit.cloudera.org:8080/#/c/3385/3/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

PS3, Line 190: 5: optional list<CatalogObjects.TPartitionKeyValue> partition_spec
Can you plz add a comment?


http://gerrit.cloudera.org:8080/#/c/3385/3/fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java:

PS3, Line 35: //null if refreshing the whole table or invalidating the entire catalog.
Let's reverse this comment :) "not-null when ...."


PS3, Line 41: Preconditions.checkArgument(partitionSpec == null
            :         || (partitionSpec != null && name != null && !name.isEmpty()));
isn't this equivalent to partitionSpec == null || name != null? I believe the indention of this check should be that you can't have a partition spec without a valid table name, no?


PS3, Line 46: if (partitionSpec_ != null) {
            :       partitionSpec_.setTableName(tableName_);
            :     }
single line?


http://gerrit.cloudera.org:8080/#/c/3385/3/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS3, Line 1175: HdfsPartition hdfspartition = hdfsTable
              :         .getPartitionFromThriftPartitionSpec(partitionSpec);
              :     // For the case where partition does not exist in Catalog cache but might
              :     // exist in Hive Metastore
              :     String partitionName = hdfspartition == null
              :         ? hdfsTable.getPartitionNameFromThriftPartitionSpec(partitionSpec)
              :         : hdfspartition.getPartitionName();
You seem to be accessing partition info from a table without protecting it from other concurrent accesses (e.g. someone could be dropping the partition you're trying to refresh). I believe this section should be inside the synchronized block.


PS3, Line 1200: dropPartition(tbl, partitionSpec);
Why are you calling the dropPartition from this class and not the HdfsTable.dropPartition()?


PS3, Line 1208: dropPartition(tbl, partitionSpec);
              :         refreshedPartition = hdfsTable.createPartition(partition.getSd(), partition);
              :         addPartition(refreshedPartition);
You're exposing the process of refreshing a table partition to the reader of this class, which is not ideal. This task belongs to the table, so I would expect something like hdfsTable.refreshPartition(partition); call here.


http://gerrit.cloudera.org:8080/#/c/3385/3/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

PS3, Line 1921: this.
Remove 'this' from here and everywhere else, it's not needed.


PS3, Line 1927: kv.getValue() == null || kv.getValue().isEmpty()
You can probably use Guava's Strings.isNullOrEmpty(kv.getValue()) here


PS3, Line 1933: if(partitionCols.size()!=partitionVals.size()) return null;
Are you sure this is possible? Won't we catch it during the analysis of the PartitionSpec? If that's the case, we should convert this into a check.


Line 1936:         partitionCols,partitionVals);
nit: space after ','


http://gerrit.cloudera.org:8080/#/c/3385/3/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

PS3, Line 33: class TestRefreshPartition(ImpalaTestSuite):
Comment what this class is testing


Line 104: 
nit: remove empty line


PS3, Line 116: """
             :     Partitions added in Hive can be viewed in Impala after refreshing
             :     partition.
             :     """
nit: can you keep the comment formatting consistent (e.g. """blah wwfwfw"""


PS3, Line 137: Partitions
"Partition"


PS3, Line 158: test_add_data_and_refresh
Another interesting test is to create 2 partitions, insert data through hive to both of them, refresh one and ensure that only data in one partition is visible to impala.


Line 267:     assert result.data == [str(file_num_rows)]
You can also continue that test by removing the data file, calling refresh and validating that no rows get returned to Impala.


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 7: Code-Review+1

The python test file looks good from my perspective.

-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 8: Code-Review+1

Carrying forward David's +1

-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 1180:       long newCatalogVersion = incrementAndGetCatalogVersion();
> - I'm not sure on this one but incrementAndGetCatalogVersion() seems to be 
CatalogLock_ is a reentrant lock, so this behavior is fine (i.e. no deadlock). Any function that modifies table metadata should follow the locking protocol which is described in the CatalogOpExecutor class. The protocol specifies the order in which tbl and catalog locks should be acquired and released, respectively. Plz let me know if there is a function that violates that.


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 8:

Can you also run a simple perf experiment? Can you create a table with lot's of partitions (~20-30k) and 1-2 files per partition and compare the performance of refresh and per-partition refresh.

-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 8:

(1 comment)

Carrying forward David's +1

http://gerrit.cloudera.org:8080/#/c/3385/8/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

PS8, Line 1947:  the file and block metadata of
> Sorry, I know I made you do this changes but can you remove it? There are m
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "David Knupp (Code Review)" <ge...@cloudera.org>.
David Knupp has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3385/9/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

PS9, Line 27: class ImpalaTableWrapper(object):
> this seems useful! I wonder if there's a better place to put it so that oth
So, if we were to make something like this class generally available, it should probably go into tests/conftest.py. (Note that this wrapper was taken from tests/metadata/test_hms_integration.py, and there are a few other wrappers there like it.)

In conftest.py, there's also "unique_database," which does something very similar to this context manager, but it has been written somewhat differently -- as a pytest fixture. So we might need to decide upon a standard approach for making these sorts of resources generally available.

What would you say to submitting this code as is, and then opening an enhancement ticket against the test framework? We can put in a note to refactor this module as well as test_hms_integration once that enhancement has been written?

(Going to ping mikeb to weigh in as well.)


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 13: Code-Review+2

Carrying forward Henry's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 13
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 11: Code-Review+1

(1 comment)

Henry do you want to give the final +2? The changes look good from my side.

http://gerrit.cloudera.org:8080/#/c/3385/11/fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java:

Line 41:     Preconditions.checkArgument((!isRefresh && partitionSpec==null) || name != null);
I'm not sure this condition is quire right. If name is not null we can still have !isRefresh && partitionSpec != null

spaces before and after == for consistent style

might be simpler to add a separate preconditions check


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 11
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Hello David Knupp, Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3385

to look at the new patch set (#12).

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................

IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following syntax:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Performance:
For a table with 10000 partitions and 1 file per partition

                     execResetMetadata()       Total Execution Time

Refresh Table              3795 ms                   4630 ms

Refersh Partition            42 ms                    680 ms

We see that the time to refresh improves by a factor of 90x but due to
significant overhead of about 640ms in this case the effective improvement
is about 7x. As the size of the table and number of partitions increase.
This improvement would be more significant.

Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
A tests/metadata/test_refresh_partition.py
11 files changed, 408 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/85/3385/12
-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 12
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/3385/5/fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java:

PS5, Line 42: Preconditions.checkArgument(!isRefresh || name != null);
            :     this.tableName_ = name;
> Done
Did you remove the second check?


http://gerrit.cloudera.org:8080/#/c/3385/5/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS5, Line 1174: Db db = tbl.getDb();
> These 3 variables can be moved to L1195 but I kept them here because I want
These operations are very cheap and the synchronized block protects a specific table object so I am not really worried about them. So, lets move them closer to where they've been used to improve the readability of this function.


http://gerrit.cloudera.org:8080/#/c/3385/6/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS6, Line 1164: To achieve that, it fetches a fresh copy of
              :    * the partition from Hive metastore and passes to it the table object to
              :    * reload the metadata from it
No need to describe the implementation unless it is something non-intuitive that a reader will not be able to figure out by reading the code.


PS6, Line 1182: getPartitionFromThriftPartitionSpec
I believe we should name this getPartition(). The name simply describes the type of the input param which is kind of redundant.


http://gerrit.cloudera.org:8080/#/c/3385/6/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

PS6, Line 1926: public String constructPartitionName(List<TPartitionKeyValue> partitionSpec) {
Can you add a function comment?


PS6, Line 1943: from the Partition object
maybe "Reloads the file and block metadata of a partition by dropping it and adding it back to the table". You need to comment on the input params. It's not clear why you need the partitionSpec even though you have the partition object.


http://gerrit.cloudera.org:8080/#/c/3385/6/fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java:

PS6, Line 785: //Positive cases for checking refresh partition
             :     AuthzOk("refresh functional.alltypesagg partition (year=2010, month =1, day = 1)");
             :     AuthzOk("refresh functional.alltypes partition (year =2009, month=1)");
             :     AuthzOk("refresh functional_seq_snap.alltypes partition (year =2009, month=1)");
You also need to add some analysis tests in AnalyzerTest.java (fn TestResetMetadata()).


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Hello David Knupp, Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3385

to look at the new patch set (#9).

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................

IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following syntax:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Performance:
For a table with 10000 partitions and 1 file per partition

                     execResetMetadata()       Total Execution Time

Refresh Table              3795 ms                   4630 ms

Refersh Partition            42 ms                    680 ms

We see that the time to refresh improves by a factor of 90x but due to
significant overhead of about 640ms in this case the effective improvement
is about 7x. As the size of the table and number of partitions increase.
This improvement would be more significant.

Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
A tests/metadata/test_refresh_partition.py
11 files changed, 450 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/85/3385/9
-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 7:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/3385/6/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS6, Line 1182:  hdfspartition.getPartitionName();
> this method was already there in the hdfs table class. the reason i think t
Sorry didn't realize there was another one with this input param. That's fine, let's leave it as is.


http://gerrit.cloudera.org:8080/#/c/3385/7/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS7, Line 1172: HdfsTable hdfsTable = ((HdfsTable) tbl);
Move above L1176


PS7, Line 1173: Db db = tbl.getDb();
              :       TTableName tblName = new TTableName(tbl.getDb().getName().toLowerCase(),
              :           tbl.getName().toLowerCase());
Move above L1187


PS7, Line 1189: refreshedPartition
Why refreshed? This is just the HMS partition object. You can simply call it hmsPartition.


http://gerrit.cloudera.org:8080/#/c/3385/7/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

PS7, Line 1926: Returns the name of the partition described by the input thrift
              :   // partition spec object
How about "Constructs a partition name from a list of TPartitionKeyValue objects."


PS7, Line 1945: of a partition by removing the old
              :   // metadata using the 'oldPartition' object and adding back fresh metadata to
              :   // the table using the newly fetched 'refershedPartition' object
How about "of partition 'oldPartition' by removing it from the table and reconstructing it from the HMS partition object 'hmsPartition'".


PS7, Line 1948: refreshPartition
Maybe rename to 'reloadPartition'.


PS7, Line 1949: refershedPartition
Can you change this to hmsPartition? This is not a refreshed HdfsTable::HdfsPartition object that you simply use to replace oldPartition. It is an HMS partition object based on which you reconstruct the HdfsPartition object that replaces oldPartition.


http://gerrit.cloudera.org:8080/#/c/3385/7/fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java:

PS7, Line 659: AnalysisError("refresh functional.alltypessmall partition (year=2009, int_col=10)",
Can you also add the cases:
a) the same partition column shows up multiple times e.g. refresh functional.alltypessmall partition (year = 2009, year=2010); 
b) no partition columns are used
c) the partition key value is not compatible with the partition column e.g. year = "dimitris"
d) partition key value is NULL, e.g. year = NULL


PS7, Line 664:     
remove extra space


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 9: Code-Review+1

Carrying forward David's +1

-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has uploaded a new patch set (#4).

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................

IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following syntax:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
A tests/metadata/test_refresh_partition.py
10 files changed, 442 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/85/3385/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 1180:       long newCatalogVersion = incrementAndGetCatalogVersion();
> It's a ReentrantReadWriteLock so you can acquire the lock again if you alre
ok.. I didn't realize its a reentrant lock.  Also now that I check the code, every invocation of incrementAndGetCatalogVersion() seems to be following this. Please ignore the above comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 9:

(17 comments)

Overall, this looks pretty high quality. Well done.

http://gerrit.cloudera.org:8080/#/c/3385/9/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS9, Line 1182: ((HdfsTable) tbl);
nit: extraneous parentheses


Line 1185:       // Retrieve partition name from existing partition or construct it from
does it make sense to do some basic sanity testing here? like checking the number of columns is equal to the table's number of clustering columns. 

The concern I have is that all the code below gets executed even with a malformed partition spec. I'd rather detect that case and early-exit from this method. You could add HdfsTable.isCompatiblePartitionSpec() or something to do this checking for you.


PS9, Line 1204: if (hdfspartition != null) {
              :             hdfsTable.dropPartition(partitionSpec);
              :             tbl.setCatalogVersion(newCatalogVersion);
              :           }
is this path covered by your tests at all?


http://gerrit.cloudera.org:8080/#/c/3385/9/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

PS9, Line 1929: public String
can this be static?


PS9, Line 1931: for (int i = 0; i < getNumClusteringCols(); ++i) {
              :       partitionCols.add(getColumns().get(i).getName());
              :     }
seems like a good idea to add Table.getClusteringCols() for this purpose.


PS9, Line 1936: kv :
tiny nit: we don't put a space before a colon in for expressions like this.


PS9, Line 1952: Partition hmsPartition)
can this parameter fit on the previous line?


Line 1953:     dropPartition(oldPartition);
can you check (via precondition) that oldPartition and hmsPartition refer to the same partition?


http://gerrit.cloudera.org:8080/#/c/3385/9/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

PS9, Line 21: nsion, \
long line


PS9, Line 27: class ImpalaTableWrapper(object):
this seems useful! I wonder if there's a better place to put it so that others can take advantage.


PS9, Line 41: (self.table_name, self.table_spec))
combine with previous line?


PS9, Line 109: cant
can't


PS9, Line 168: s
exist


PS9, Line 172: db_name + '.' + self.unique_string()
since you do this on every (?) construction of an ImpalaTableWrapper, why not pass the db and table name as separate parameters, and ITW do the concatenation?


PS9, Line 266: \
nit: remove


PS9, Line 281: dst_path = "%s/year=2010/month=1/%s" % (table_location, file_name)
             :     check_call(["hadoop", "fs", "-cp", "-f", src_file, dst_path], shell=False)
             :     dst_path = "%s/year=2010/month=2/%s" % (table_location, file_name)
             :     check_call(["hadoop", "fs", "-cp", "-f", src_file, dst_path], shell=False)
nit: it's maybe worth, in this case, replacing this with a two or three liner to make the code less dense:

  dst_path = table_location + "/year=2010/month=%s" + file_name
  for month in [1, 2]:
    check_call["hadoop", "fs", "-cp", "-f", src_file, dst_path % month], shell=False)


PS9, Line 295: %
long line


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 8: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3385/8/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

PS8, Line 1947:  the file and block metadata of
Sorry, I know I made you do this changes but can you remove it? There are more to it than just files and block metadata.


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 9:

(17 comments)

Note: I will be moving this check-in to the ASF gerrit repo as soon as I get +2

http://gerrit.cloudera.org:8080/#/c/3385/9/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS9, Line 1182: ((HdfsTable) tbl);
> nit: extraneous parentheses
Done


Line 1185:       // Retrieve partition name from existing partition or construct it from
> does it make sense to do some basic sanity testing here? like checking the 
the partition spec we receive here already goes through analysis check after parsing step. If there was any incompatibility then it would be caught there


PS9, Line 1204: if (hdfspartition != null) {
              :             hdfsTable.dropPartition(partitionSpec);
              :             tbl.setCatalogVersion(newCatalogVersion);
              :           }
> is this path covered by your tests at all?
Yes, this is covered by 'test_drop_hive_partition_and_refresh' in test_refresh_partition.py


http://gerrit.cloudera.org:8080/#/c/3385/9/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

PS9, Line 1929: public String
> can this be static?
Done


PS9, Line 1931: for (int i = 0; i < getNumClusteringCols(); ++i) {
              :       partitionCols.add(getColumns().get(i).getName());
              :     }
> seems like a good idea to add Table.getClusteringCols() for this purpose.
I realized that we don't really need anything from the HdfsTable object, this makes it simpler to change this method to static and only use the  TPartitionKeyValue objects to construct the name


PS9, Line 1936: kv :
> tiny nit: we don't put a space before a colon in for expressions like this.
Done


PS9, Line 1952: Partition hmsPartition)
> can this parameter fit on the previous line?
Done


Line 1953:     dropPartition(oldPartition);
> can you check (via precondition) that oldPartition and hmsPartition refer t
Done


http://gerrit.cloudera.org:8080/#/c/3385/9/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

PS9, Line 21: nsion, \
> long line
Done


PS9, Line 27: class ImpalaTableWrapper(object):
> My take is that while this could be useful for some test code, it's a bit s
No, I don't anticipate any such test. Therefore I think the best thing would be to remove this class from here.


PS9, Line 41: (self.table_name, self.table_spec))
> combine with previous line?
Done


PS9, Line 109: cant
> can't
Done


PS9, Line 168: s
> exist
Done


PS9, Line 172: db_name + '.' + self.unique_string()
> since you do this on every (?) construction of an ImpalaTableWrapper, why n
Done


PS9, Line 266: \
> nit: remove
Done


PS9, Line 281: dst_path = "%s/year=2010/month=1/%s" % (table_location, file_name)
             :     check_call(["hadoop", "fs", "-cp", "-f", src_file, dst_path], shell=False)
             :     dst_path = "%s/year=2010/month=2/%s" % (table_location, file_name)
             :     check_call(["hadoop", "fs", "-cp", "-f", src_file, dst_path], shell=False)
> nit: it's maybe worth, in this case, replacing this with a two or three lin
Done


PS9, Line 295: %
> long line
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 12: Code-Review+2

lgtm pending my last few superficial comments

-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 12
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has uploaded a new patch set (#6).

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................

IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following syntax:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
A tests/metadata/test_refresh_partition.py
10 files changed, 439 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/85/3385/6
-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3385/10/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

Line 1180:       long newCatalogVersion = incrementAndGetCatalogVersion();
> CatalogLock_ is a reentrant lock, so this behavior is fine (i.e. no deadloc
It's a ReentrantReadWriteLock so you can acquire the lock again if you already have it.


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 11: Code-Review+1

(1 comment)

Carrying forward David's +1

http://gerrit.cloudera.org:8080/#/c/3385/11/fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java:

Line 41:     Preconditions.checkArgument((!isRefresh && partitionSpec==null) || name != null);
> I'm not sure this condition is quire right. If name is not null we can stil
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 11
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Hello David Knupp, Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3385

to look at the new patch set (#10).

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................

IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following syntax:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Performance:
For a table with 10000 partitions and 1 file per partition

                     execResetMetadata()       Total Execution Time

Refresh Table              3795 ms                   4630 ms

Refersh Partition            42 ms                    680 ms

We see that the time to refresh improves by a factor of 90x but due to
significant overhead of about 640ms in this case the effective improvement
is about 7x. As the size of the table and number of partitions increase.
This improvement would be more significant.

Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
A tests/metadata/test_refresh_partition.py
11 files changed, 405 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/85/3385/10
-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 7:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/3385/7/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS7, Line 1172: HdfsTable hdfsTable = ((HdfsTable) tbl);
> Move above L1176
Done


PS7, Line 1173: Db db = tbl.getDb();
              :       TTableName tblName = new TTableName(tbl.getDb().getName().toLowerCase(),
              :           tbl.getName().toLowerCase());
> Move above L1187
Done


PS7, Line 1189: refreshedPartition
> Why refreshed? This is just the HMS partition object. You can simply call i
Done


http://gerrit.cloudera.org:8080/#/c/3385/7/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

PS7, Line 1926: Returns the name of the partition described by the input thrift
              :   // partition spec object
> How about "Constructs a partition name from a list of TPartitionKeyValue ob
Done


PS7, Line 1945: of a partition by removing the old
              :   // metadata using the 'oldPartition' object and adding back fresh metadata to
              :   // the table using the newly fetched 'refershedPartition' object
> How about "of partition 'oldPartition' by removing it from the table and re
Done


PS7, Line 1948: refreshPartition
> Maybe rename to 'reloadPartition'.
Done


PS7, Line 1949: refershedPartition
> Can you change this to hmsPartition? This is not a refreshed HdfsTable::Hdf
Done


http://gerrit.cloudera.org:8080/#/c/3385/7/fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java:

PS7, Line 659: AnalysisError("refresh functional.alltypessmall partition (year=2009, int_col=10)",
> Can you also add the cases:
all added except 'b)', since its a syntax check I added it in ParserTest instead


PS7, Line 664:     
> remove extra space
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 6:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/3385/5/fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java:

PS5, Line 42: Preconditions.checkArgument(!isRefresh || name != null);
            :     this.tableName_ = name;
> Did you remove the second check?
Yes, I realized that the first condition makes sure that the table name is not null if its a refresh statement, therefore my added check was redundant


http://gerrit.cloudera.org:8080/#/c/3385/5/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS5, Line 1174: Db db = tbl.getDb();
> These operations are very cheap and the synchronized block protects a speci
Done


http://gerrit.cloudera.org:8080/#/c/3385/6/fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java:

PS6, Line 1164: To achieve that, it fetches a fresh copy of
              :    * the partition from Hive metastore and passes to it the table object to
              :    * reload the metadata from it
> No need to describe the implementation unless it is something non-intuitive
Done


PS6, Line 1182: getPartitionFromThriftPartitionSpec
> I believe we should name this getPartition(). The name simply describes the
this method was already there in the hdfs table class. the reason i think they did this was because there is already a method 'getPartition()' that takes input of type List<PartitionKeyValue>, if I rename this method as well, it would result in an error since this one takes input of type List<TPartitionKeyValue> which is also of List<> type.
Would you suggest I create a single getPartition method which calls the right method according to the generic type of the List<> ?


http://gerrit.cloudera.org:8080/#/c/3385/6/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

PS6, Line 1926: public String constructPartitionName(List<TPartitionKeyValue> partitionSpec) {
> Can you add a function comment?
Done


PS6, Line 1943: from the Partition object
> maybe "Reloads the file and block metadata of a partition by dropping it an
Done


http://gerrit.cloudera.org:8080/#/c/3385/6/fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
File fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java:

PS6, Line 785: //Positive cases for checking refresh partition
             :     AuthzOk("refresh functional.alltypesagg partition (year=2010, month =1, day = 1)");
             :     AuthzOk("refresh functional.alltypes partition (year =2009, month=1)");
             :     AuthzOk("refresh functional_seq_snap.alltypes partition (year =2009, month=1)");
> You also need to add some analysis tests in AnalyzerTest.java (fn TestReset
Done


http://gerrit.cloudera.org:8080/#/c/3385/6/tests/metadata/test_refresh_partition.py
File tests/metadata/test_refresh_partition.py:

PS6, Line 16: import logging
            : import pytest
> Removed unused imports
Done


PS6, Line 19: import shlex
> Unused import
Done


PS6, Line 21: import subprocess
> Unused import
Done


PS6, Line 23: import *
> "import *" is generally regarded as bad practice in python. It's better to 
Done


PS6, Line 52:   class ImpalaTableWrapper(object):
> Sorry -- I didn't notice before that this class is nested within the outer 
Done


Line 82:     for row in rows:
> This is perfectly well formed, although it would have been also possible to
Done


PS6, Line 99: range(0, 16)
> You actually don't need the 0, and in python 2.x, it's more customary to us
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Hello David Knupp,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3385

to look at the new patch set (#8).

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................

IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following syntax:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/analysis/AnalyzerTest.java
M fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
A tests/metadata/test_refresh_partition.py
11 files changed, 450 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/85/3385/8
-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................


Patch Set 10:

Will wait until you've addressed Alex's comments, but this looks in pretty good shape to me.

-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has uploaded a new patch set (#2).

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................

IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following query format:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
A tests/metadata/test_refresh_partition.py
10 files changed, 393 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/85/3385/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has uploaded a new patch set (#5).

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................

IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following syntax:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
A tests/metadata/test_refresh_partition.py
10 files changed, 441 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/85/3385/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: David Knupp <dk...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-1683: Allow REFRESH on a single partition

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has uploaded a new patch set (#3).

Change subject: IMPALA-1683: Allow REFRESH on a single partition
......................................................................

IMPALA-1683: Allow REFRESH on a single partition

Currently the only way to refresh metadata for a partition was to refresh
the whole table. This is a relatively time consuming process especially if
there are many partitions and only one is to be refreshed.
This patch allows the client to REFRESH on a single partition by using the
following query format:
REFRESH [database_name.]table_name PARTITION (partition_spec)

Testing:
Added parsing and authorization tests in ParserTest.java and
AuthorizationTest.java respectively. A new test file
"test_refresh_partition.py" was added for testing functionality.

Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
---
M common/thrift/CatalogService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/AnalysisContext.java
M fe/src/main/java/com/cloudera/impala/analysis/ResetMetadataStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/analysis/AuthorizationTest.java
M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java
A tests/metadata/test_refresh_partition.py
10 files changed, 392 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/85/3385/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3385
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4bf5dd630f2e175be5800187bbf9337c91f82bd8
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>