You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Eugene Koifman (JIRA)" <ji...@apache.org> on 2015/09/16 21:46:45 UTC
[jira] [Updated] (HIVE-11683) Hive Streaming may overload the
metastore
[ https://issues.apache.org/jira/browse/HIVE-11683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Koifman updated HIVE-11683:
----------------------------------
Component/s: Metastore
> Hive Streaming may overload the metastore
> -----------------------------------------
>
> Key: HIVE-11683
> URL: https://issues.apache.org/jira/browse/HIVE-11683
> Project: Hive
> Issue Type: Bug
> Components: HCatalog, Hive, Metastore, Transactions
> Affects Versions: 1.0.0
> Reporter: Eugene Koifman
> Assignee: Roshan Naik
>
> HiveEndPoint represents a way to write to a specific partition transactionally.
> Each HiveEndPoint creates TransactionBatch(es) and commits transactions.
> Suppose you have 10 instances of Storm Hive bolt using Streaming API.
> Each instance will create HiveEndPoints on demand when it sees an event for particular partition value.
> If events are uniformly distributed wrt partition values and the table has 1000 partitions (for example it's partitioned by CustomerId), each of 10 bolt instances may create 1000 HiveEndPoints and thus > 10,000 (actually 10K * num_txn_per_batch) concurrent transactions.
> This creates huge amount of Metastore traffic.
> HIVE-11672 is investigating how some sort of "shuffle" phase can be added route events for a particular bucket to the same bolt instance.
> The same idea should explored to route events based on partition value.
> cc [~alangates],[~sriharsha],[~rbains]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)