You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Xiaobo Gu <gu...@gmail.com> on 2012/05/13 14:36:06 UTC

Hive production layout suggestions

Hi,

To let multiple users share a single Hive instance, we know that we
should use stand alone metastore services, but what about cli (and
other clients) and hiveserver services, what's the best pratice for
the server layout for a production Hive instance?

1. I think hive metastore, hwi, and hiveserver services are all hadoop
clients, they should be running on servers which are not part of the
Hadoop cluster, so we should prepare a dedicated server for them, or
one server for each service, this is dependent on workloads.
2. For cli users, because cli has embedded hiveserver, which can
connect to metastore service directlly, we can install hive clis on
their workstations, with the same Hadoop/Hive binaries and
configuration files on their workstations.
3. For JDBC and ODBC clients, because they must connect to a
hiveserver, which can only handle one query at a time, so we must
start one hiveserver service for each client, only the JDBC,ODBC
driver is needed on the client, no Hive or Hadoop binaries are needed
on them.

Do I miss anything?

Regards,

Xiaobo Gu

Re: Hive production layout suggestions

Posted by Xiaobo Gu <gu...@gmail.com>.
On Sun, May 13, 2012 at 10:28 PM, Edward Capriolo <ed...@gmail.com> wrote:
>  Xiaobo,

>
> 2) CLI does not have an embedded hive server

I mean CLI can talk to metastore service directlly, and it has a Hive
Driver(not HiveServer)

Re: Hive production layout suggestions

Posted by Edward Capriolo <ed...@gmail.com>.
 Xiaobo,

I believe you misunderstand some basic parts of hive.

1) You do not need to run the metastore server. It is an optional
component. Many people use JDBC and this allows multiple users to
concurrently use hive without having separate installs.

2) CLI does not have an embedded hive server

3) Hive servers can handle more then one connection at once but they
have a few subtle concurrency issues being worked on.

On Sun, May 13, 2012 at 8:36 AM, Xiaobo Gu <gu...@gmail.com> wrote:
> Hi,
>
> To let multiple users share a single Hive instance, we know that we
> should use stand alone metastore services, but what about cli (and
> other clients) and hiveserver services, what's the best pratice for
> the server layout for a production Hive instance?
>
> 1. I think hive metastore, hwi, and hiveserver services are all hadoop
> clients, they should be running on servers which are not part of the
> Hadoop cluster, so we should prepare a dedicated server for them, or
> one server for each service, this is dependent on workloads.
> 2. For cli users, because cli has embedded hiveserver, which can
> connect to metastore service directlly, we can install hive clis on
> their workstations, with the same Hadoop/Hive binaries and
> configuration files on their workstations.
> 3. For JDBC and ODBC clients, because they must connect to a
> hiveserver, which can only handle one query at a time, so we must
> start one hiveserver service for each client, only the JDBC,ODBC
> driver is needed on the client, no Hive or Hadoop binaries are needed
> on them.
>
> Do I miss anything?
>
> Regards,
>
> Xiaobo Gu