You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Navis Ryu <na...@nexr.com> on 2014/04/02 08:04:55 UTC

Review Request 19903: Support bulk deleting directories for partition drop with partial spec

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19903/
-----------------------------------------------------------

Review request for hive.


Bugs: HIVE-6809
    https://issues.apache.org/jira/browse/HIVE-6809


Repository: hive-git


Description
-------

In busy hadoop system, dropping many of partitions takes much more time than expected. In hive-0.11.0, removing 1700 partitions by single partial spec took 90 minutes, which is reduced to 3 minutes when deleteData is set false. I couldn't test this in recent hive, which has HIVE-6256 but if the time-taking part is mostly from removing directories, it seemed not helpful to reduce whole processing time.


Diffs
-----

  itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 47e94ea 
  metastore/if/hive_metastore.thrift eef1b80 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 2a1b4d7 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 9567874 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp b18009c 
  metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 4f051af 
  metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php c79624f 
  metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote fdedb57 
  metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 23679be 
  metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 56c23e6 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 27077b4 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 664dccd 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 0c2209b 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6a0eabe 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java e0de0e0 
  metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 5c00aa1 
  metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 5025b83 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 5cb030c 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 5d5fa78 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java a73a5e0 
  ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f 
  ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionSpec.java PRE-CREATION 
  ql/src/test/queries/clientpositive/drop_partitions_partialspec.q PRE-CREATION 
  ql/src/test/results/clientpositive/drop_partitions_partialspec.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/19903/diff/


Testing
-------


Thanks,

Navis Ryu


Re: Review Request 19903: Support bulk deleting directories for partition drop with partial spec

Posted by Navis Ryu <na...@nexr.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19903/
-----------------------------------------------------------

(Updated April 9, 2014, 3:26 a.m.)


Review request for hive.


Changes
-------

Fixed test fail


Bugs: HIVE-6809
    https://issues.apache.org/jira/browse/HIVE-6809


Repository: hive-git


Description
-------

In busy hadoop system, dropping many of partitions takes much more time than expected. In hive-0.11.0, removing 1700 partitions by single partial spec took 90 minutes, which is reduced to 3 minutes when deleteData is set false. I couldn't test this in recent hive, which has HIVE-6256 but if the time-taking part is mostly from removing directories, it seemed not helpful to reduce whole processing time.


Diffs (updated)
-----

  hcatalog/core/src/main/java/org/apache/hcatalog/cli/SemanticAnalysis/HCatSemanticAnalyzer.java d348b9b 
  metastore/if/hive_metastore.thrift eef1b80 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 2a1b4d7 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 9567874 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp b18009c 
  metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 4f051af 
  metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php c79624f 
  metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote fdedb57 
  metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 23679be 
  metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 56c23e6 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 18e62d8 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 664dccd 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 0c2209b 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6a0eabe 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java e0de0e0 
  metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java f731dab 
  metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 5c00aa1 
  metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 5025b83 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 5cb030c 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java e6cb70f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java a40a88d 
  ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f 
  ql/src/test/queries/clientpositive/drop_partitions_partialspec.q PRE-CREATION 
  ql/src/test/results/clientnegative/drop_partition_failure.q.out cde0abb 
  ql/src/test/results/clientnegative/drop_partition_filter_failure.q.out c4f533b 
  ql/src/test/results/clientpositive/drop_multi_partitions.q.out eae57f3 
  ql/src/test/results/clientpositive/drop_partitions_partialspec.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/19903/diff/


Testing
-------


Thanks,

Navis Ryu


Re: Review Request 19903: Support bulk deleting directories for partition drop with partial spec

Posted by Navis Ryu <na...@nexr.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19903/
-----------------------------------------------------------

(Updated April 3, 2014, 2:09 a.m.)


Review request for hive.


Changes
-------

Addressed comments & Fixed test fails


Bugs: HIVE-6809
    https://issues.apache.org/jira/browse/HIVE-6809


Repository: hive-git


Description
-------

In busy hadoop system, dropping many of partitions takes much more time than expected. In hive-0.11.0, removing 1700 partitions by single partial spec took 90 minutes, which is reduced to 3 minutes when deleteData is set false. I couldn't test this in recent hive, which has HIVE-6256 but if the time-taking part is mostly from removing directories, it seemed not helpful to reduce whole processing time.


Diffs (updated)
-----

  metastore/if/hive_metastore.thrift eef1b80 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 2a1b4d7 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 9567874 
  metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp b18009c 
  metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 4f051af 
  metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php c79624f 
  metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote fdedb57 
  metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 23679be 
  metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 56c23e6 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 27077b4 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 664dccd 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 0c2209b 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6a0eabe 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java e0de0e0 
  metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 5c00aa1 
  metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 5025b83 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 5cb030c 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 5d5fa78 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java a73a5e0 
  ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f 
  ql/src/test/queries/clientpositive/drop_partitions_partialspec.q PRE-CREATION 
  ql/src/test/results/clientnegative/drop_partition_failure.q.out b94c6c2 
  ql/src/test/results/clientnegative/drop_partition_filter_failure.q.out 0ab5e02 
  ql/src/test/results/clientpositive/drop_multi_partitions.q.out 735920b 
  ql/src/test/results/clientpositive/drop_partitions_partialspec.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/19903/diff/


Testing
-------


Thanks,

Navis Ryu


Re: Review Request 19903: Support bulk deleting directories for partition drop with partial spec

Posted by Navis Ryu <na...@nexr.com>.

> On April 2, 2014, 8:59 p.m., Sergey Shelukhin wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, line 2273
> > <https://reviews.apache.org/r/19903/diff/1/?file=544980#file544980line2273>
> >
> >     no check for archived?

Ah, missed that.


> On April 2, 2014, 8:59 p.m., Sergey Shelukhin wrote:
> > metastore/if/hive_metastore.thrift, line 766
> > <https://reviews.apache.org/r/19903/diff/1/?file=544971#file544971line766>
> >
> >     this is not a backward compatible change. Have you considered modifying the new API that uses req/resp pattern? Otherwise, new calls will have to be added.

I wanted not to add more methods in metastore (ThriftHiveMetastore has 125K lines of code). The uses of drop_partition might use drop_partition_by_name instead. I'll add those methods in metastore client.


- Navis


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19903/#review39347
-----------------------------------------------------------


On April 2, 2014, 6:04 a.m., Navis Ryu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19903/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 6:04 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-6809
>     https://issues.apache.org/jira/browse/HIVE-6809
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> In busy hadoop system, dropping many of partitions takes much more time than expected. In hive-0.11.0, removing 1700 partitions by single partial spec took 90 minutes, which is reduced to 3 minutes when deleteData is set false. I couldn't test this in recent hive, which has HIVE-6256 but if the time-taking part is mostly from removing directories, it seemed not helpful to reduce whole processing time.
> 
> 
> Diffs
> -----
> 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 47e94ea 
>   metastore/if/hive_metastore.thrift eef1b80 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 2a1b4d7 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 9567874 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp b18009c 
>   metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 4f051af 
>   metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php c79624f 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote fdedb57 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 23679be 
>   metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 56c23e6 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 27077b4 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 664dccd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 0c2209b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6a0eabe 
>   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java e0de0e0 
>   metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 5c00aa1 
>   metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 5025b83 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 5cb030c 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 5d5fa78 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java a73a5e0 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionSpec.java PRE-CREATION 
>   ql/src/test/queries/clientpositive/drop_partitions_partialspec.q PRE-CREATION 
>   ql/src/test/results/clientpositive/drop_partitions_partialspec.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19903/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Navis Ryu
> 
>


Re: Review Request 19903: Support bulk deleting directories for partition drop with partial spec

Posted by Sergey Shelukhin <se...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19903/#review39347
-----------------------------------------------------------



metastore/if/hive_metastore.thrift
<https://reviews.apache.org/r/19903/#comment71707>

    this is not a backward compatible change. Have you considered modifying the new API that uses req/resp pattern? Otherwise, new calls will have to be added.



metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
<https://reviews.apache.org/r/19903/#comment71708>

    nit - verify that there're less vals than keys?



metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
<https://reviews.apache.org/r/19903/#comment71710>

    no check for archived?



metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
<https://reviews.apache.org/r/19903/#comment71711>

    nit: typo



ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
<https://reviews.apache.org/r/19903/#comment71712>

    asserts can be disabled on production, probably needs to check & throw



ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java
<https://reviews.apache.org/r/19903/#comment71713>

    should these two explains have different names?



ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionSpec.java
<https://reviews.apache.org/r/19903/#comment71714>

    why is this needed? I don't see it used anywhere


- Sergey Shelukhin


On April 2, 2014, 6:04 a.m., Navis Ryu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19903/
> -----------------------------------------------------------
> 
> (Updated April 2, 2014, 6:04 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-6809
>     https://issues.apache.org/jira/browse/HIVE-6809
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> In busy hadoop system, dropping many of partitions takes much more time than expected. In hive-0.11.0, removing 1700 partitions by single partial spec took 90 minutes, which is reduced to 3 minutes when deleteData is set false. I couldn't test this in recent hive, which has HIVE-6256 but if the time-taking part is mostly from removing directories, it seemed not helpful to reduce whole processing time.
> 
> 
> Diffs
> -----
> 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 47e94ea 
>   metastore/if/hive_metastore.thrift eef1b80 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 2a1b4d7 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 9567874 
>   metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp b18009c 
>   metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 4f051af 
>   metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php c79624f 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote fdedb57 
>   metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 23679be 
>   metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 56c23e6 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 27077b4 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 664dccd 
>   metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 0c2209b 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6a0eabe 
>   metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java e0de0e0 
>   metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 5c00aa1 
>   metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 5025b83 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 5cb030c 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 5d5fa78 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java a73a5e0 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionSpec.java PRE-CREATION 
>   ql/src/test/queries/clientpositive/drop_partitions_partialspec.q PRE-CREATION 
>   ql/src/test/results/clientpositive/drop_partitions_partialspec.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/19903/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Navis Ryu
> 
>