You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Navis Ryu <na...@nexr.com> on 2014/04/02 08:04:55 UTC
Review Request 19903: Support bulk deleting directories for partition drop
with partial spec
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19903/
-----------------------------------------------------------
Review request for hive.
Bugs: HIVE-6809
https://issues.apache.org/jira/browse/HIVE-6809
Repository: hive-git
Description
-------
In busy hadoop system, dropping many of partitions takes much more time than expected. In hive-0.11.0, removing 1700 partitions by single partial spec took 90 minutes, which is reduced to 3 minutes when deleteData is set false. I couldn't test this in recent hive, which has HIVE-6256 but if the time-taking part is mostly from removing directories, it seemed not helpful to reduce whole processing time.
Diffs
-----
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 47e94ea
metastore/if/hive_metastore.thrift eef1b80
metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 2a1b4d7
metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 9567874
metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp b18009c
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 4f051af
metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php c79624f
metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote fdedb57
metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 23679be
metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 56c23e6
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 27077b4
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 664dccd
metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 0c2209b
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6a0eabe
metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java e0de0e0
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 5c00aa1
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 5025b83
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 5cb030c
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 5d5fa78
ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java a73a5e0
ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionSpec.java PRE-CREATION
ql/src/test/queries/clientpositive/drop_partitions_partialspec.q PRE-CREATION
ql/src/test/results/clientpositive/drop_partitions_partialspec.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/19903/diff/
Testing
-------
Thanks,
Navis Ryu
Re: Review Request 19903: Support bulk deleting directories for partition
drop with partial spec
Posted by Navis Ryu <na...@nexr.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19903/
-----------------------------------------------------------
(Updated April 9, 2014, 3:26 a.m.)
Review request for hive.
Changes
-------
Fixed test fail
Bugs: HIVE-6809
https://issues.apache.org/jira/browse/HIVE-6809
Repository: hive-git
Description
-------
In busy hadoop system, dropping many of partitions takes much more time than expected. In hive-0.11.0, removing 1700 partitions by single partial spec took 90 minutes, which is reduced to 3 minutes when deleteData is set false. I couldn't test this in recent hive, which has HIVE-6256 but if the time-taking part is mostly from removing directories, it seemed not helpful to reduce whole processing time.
Diffs (updated)
-----
hcatalog/core/src/main/java/org/apache/hcatalog/cli/SemanticAnalysis/HCatSemanticAnalyzer.java d348b9b
metastore/if/hive_metastore.thrift eef1b80
metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 2a1b4d7
metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 9567874
metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp b18009c
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 4f051af
metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php c79624f
metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote fdedb57
metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 23679be
metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 56c23e6
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 18e62d8
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 664dccd
metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 0c2209b
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6a0eabe
metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java e0de0e0
metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java f731dab
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 5c00aa1
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 5025b83
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 5cb030c
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java e6cb70f
ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java a40a88d
ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f
ql/src/test/queries/clientpositive/drop_partitions_partialspec.q PRE-CREATION
ql/src/test/results/clientnegative/drop_partition_failure.q.out cde0abb
ql/src/test/results/clientnegative/drop_partition_filter_failure.q.out c4f533b
ql/src/test/results/clientpositive/drop_multi_partitions.q.out eae57f3
ql/src/test/results/clientpositive/drop_partitions_partialspec.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/19903/diff/
Testing
-------
Thanks,
Navis Ryu
Re: Review Request 19903: Support bulk deleting directories for partition
drop with partial spec
Posted by Navis Ryu <na...@nexr.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19903/
-----------------------------------------------------------
(Updated April 3, 2014, 2:09 a.m.)
Review request for hive.
Changes
-------
Addressed comments & Fixed test fails
Bugs: HIVE-6809
https://issues.apache.org/jira/browse/HIVE-6809
Repository: hive-git
Description
-------
In busy hadoop system, dropping many of partitions takes much more time than expected. In hive-0.11.0, removing 1700 partitions by single partial spec took 90 minutes, which is reduced to 3 minutes when deleteData is set false. I couldn't test this in recent hive, which has HIVE-6256 but if the time-taking part is mostly from removing directories, it seemed not helpful to reduce whole processing time.
Diffs (updated)
-----
metastore/if/hive_metastore.thrift eef1b80
metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 2a1b4d7
metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 9567874
metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp b18009c
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 4f051af
metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php c79624f
metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote fdedb57
metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 23679be
metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 56c23e6
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 27077b4
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 664dccd
metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 0c2209b
metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6a0eabe
metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java e0de0e0
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 5c00aa1
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 5025b83
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 5cb030c
ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 5d5fa78
ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java a73a5e0
ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f
ql/src/test/queries/clientpositive/drop_partitions_partialspec.q PRE-CREATION
ql/src/test/results/clientnegative/drop_partition_failure.q.out b94c6c2
ql/src/test/results/clientnegative/drop_partition_filter_failure.q.out 0ab5e02
ql/src/test/results/clientpositive/drop_multi_partitions.q.out 735920b
ql/src/test/results/clientpositive/drop_partitions_partialspec.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/19903/diff/
Testing
-------
Thanks,
Navis Ryu
Re: Review Request 19903: Support bulk deleting directories for partition
drop with partial spec
Posted by Navis Ryu <na...@nexr.com>.
> On April 2, 2014, 8:59 p.m., Sergey Shelukhin wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, line 2273
> > <https://reviews.apache.org/r/19903/diff/1/?file=544980#file544980line2273>
> >
> > no check for archived?
Ah, missed that.
> On April 2, 2014, 8:59 p.m., Sergey Shelukhin wrote:
> > metastore/if/hive_metastore.thrift, line 766
> > <https://reviews.apache.org/r/19903/diff/1/?file=544971#file544971line766>
> >
> > this is not a backward compatible change. Have you considered modifying the new API that uses req/resp pattern? Otherwise, new calls will have to be added.
I wanted not to add more methods in metastore (ThriftHiveMetastore has 125K lines of code). The uses of drop_partition might use drop_partition_by_name instead. I'll add those methods in metastore client.
- Navis
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19903/#review39347
-----------------------------------------------------------
On April 2, 2014, 6:04 a.m., Navis Ryu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19903/
> -----------------------------------------------------------
>
> (Updated April 2, 2014, 6:04 a.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-6809
> https://issues.apache.org/jira/browse/HIVE-6809
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> In busy hadoop system, dropping many of partitions takes much more time than expected. In hive-0.11.0, removing 1700 partitions by single partial spec took 90 minutes, which is reduced to 3 minutes when deleteData is set false. I couldn't test this in recent hive, which has HIVE-6256 but if the time-taking part is mostly from removing directories, it seemed not helpful to reduce whole processing time.
>
>
> Diffs
> -----
>
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 47e94ea
> metastore/if/hive_metastore.thrift eef1b80
> metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 2a1b4d7
> metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 9567874
> metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp b18009c
> metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 4f051af
> metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php c79624f
> metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote fdedb57
> metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 23679be
> metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 56c23e6
> metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 27077b4
> metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 664dccd
> metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 0c2209b
> metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6a0eabe
> metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java e0de0e0
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 5c00aa1
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 5025b83
> ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 5cb030c
> ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 5d5fa78
> ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java a73a5e0
> ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f
> ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionSpec.java PRE-CREATION
> ql/src/test/queries/clientpositive/drop_partitions_partialspec.q PRE-CREATION
> ql/src/test/results/clientpositive/drop_partitions_partialspec.q.out PRE-CREATION
>
> Diff: https://reviews.apache.org/r/19903/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Navis Ryu
>
>
Re: Review Request 19903: Support bulk deleting directories for partition
drop with partial spec
Posted by Sergey Shelukhin <se...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19903/#review39347
-----------------------------------------------------------
metastore/if/hive_metastore.thrift
<https://reviews.apache.org/r/19903/#comment71707>
this is not a backward compatible change. Have you considered modifying the new API that uses req/resp pattern? Otherwise, new calls will have to be added.
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
<https://reviews.apache.org/r/19903/#comment71708>
nit - verify that there're less vals than keys?
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
<https://reviews.apache.org/r/19903/#comment71710>
no check for archived?
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
<https://reviews.apache.org/r/19903/#comment71711>
nit: typo
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
<https://reviews.apache.org/r/19903/#comment71712>
asserts can be disabled on production, probably needs to check & throw
ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java
<https://reviews.apache.org/r/19903/#comment71713>
should these two explains have different names?
ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionSpec.java
<https://reviews.apache.org/r/19903/#comment71714>
why is this needed? I don't see it used anywhere
- Sergey Shelukhin
On April 2, 2014, 6:04 a.m., Navis Ryu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19903/
> -----------------------------------------------------------
>
> (Updated April 2, 2014, 6:04 a.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-6809
> https://issues.apache.org/jira/browse/HIVE-6809
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> In busy hadoop system, dropping many of partitions takes much more time than expected. In hive-0.11.0, removing 1700 partitions by single partial spec took 90 minutes, which is reduced to 3 minutes when deleteData is set false. I couldn't test this in recent hive, which has HIVE-6256 but if the time-taking part is mostly from removing directories, it seemed not helpful to reduce whole processing time.
>
>
> Diffs
> -----
>
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java 47e94ea
> metastore/if/hive_metastore.thrift eef1b80
> metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h 2a1b4d7
> metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp 9567874
> metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp b18009c
> metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java 4f051af
> metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php c79624f
> metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote fdedb57
> metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py 23679be
> metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb 56c23e6
> metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 27077b4
> metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 664dccd
> metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 0c2209b
> metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 6a0eabe
> metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java e0de0e0
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java 5c00aa1
> metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java 5025b83
> ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 5cb030c
> ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 5d5fa78
> ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java a73a5e0
> ql/src/java/org/apache/hadoop/hive/ql/plan/DropTableDesc.java ba30e1f
> ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionSpec.java PRE-CREATION
> ql/src/test/queries/clientpositive/drop_partitions_partialspec.q PRE-CREATION
> ql/src/test/results/clientpositive/drop_partitions_partialspec.q.out PRE-CREATION
>
> Diff: https://reviews.apache.org/r/19903/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Navis Ryu
>
>