You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Omkar Rohadkar <om...@ellicium.com> on 2023/04/24 07:15:20 UTC

Impala multinode installtion

Hi Team,

I've successfully done installation of open source impala and integrated
Apache services like hadoop ,hive, hbase on a single node . This setup is
working fine.

But now I want to install  Impala in a multi-node environment and integrate
it with Apache services. I have a few questions about it.

When we build and install Impala and start impala service using start
impala cluster.py file it starts all the impala services like 1 catalogd 1
statestore and 3 daemons.

But now in multi node setup we only want catalog and statestore to run on
master node and one impala daemon on all the worker nodes.

May I know how we can achieve this? Do we need to build it differently? and
in which config files or script files we need to change so that we can
achieve the above setup.

For installation of multi node setup if we build it on every node how can
we combine it as a one cluster?

Hope to get your help on the above doubts.

Thanks and regards,
Omkar R

-- 
Privileged/Confidential information may be contained in this message and 
may be subject to legal privilege. Access to this e-mail by anyone other 
than the intended is unauthorized. If you are not the intended recipient 
(or responsible for delivery of the message to such person), you may not 
use, copy, distribute or deliver to anyone this message (or any part of its 
contents ) or take any action in reliance on it. In such case, you should 
destroy this message, and notify us immediately. If you have received this 
email in error, please notify us immediately by e-mail or telephone and 
delete the e-mail from any computer. If you or your employer does not 
consent to internet e-mail messages of this kind, please notify us 
immediately by e-mail. All reasonable precautions have been taken to ensure 
no viruses are present in this e-mail. As our company cannot accept 
responsibility for any loss or damage arising from the use of this e-mail 
or attachments we recommend that you subject these to your virus checking 
procedures prior to use. The views, opinions, conclusions and other 
information expressed in this electronic mail are not given or endorsed by 
the company unless otherwise indicated by an authorized representative 
independent of this message.

Re: Impala multinode installtion

Posted by Xiang Yang <yx...@126.com>.
Hi Omkar R,There is an ongoing patch that can help you: https://gerrit.cloudera.org/c/18939/, you can reference this conversation: https://lists.apache.org/thread/8l3q8szosw4rh8yg9g5njb0w0fmwc3yp .





Best regards,
Xiang Yang
At 2023-04-24 17:00:12, "Tamás Máté" <tm...@apache.org> wrote:
>Hi Omkar R,
>
>Thanks for reaching out to us. My understanding is that you are looking for
>a way to deploy/start an Impala cluster in a multi node environment.
>
>At this point Impala does not have a solution to deploy/start a cluster in
>to/on multiple hosts. The start-impala-cluster.py script starts and
>configures it on a single hosts, or it can start the cluster in a
>dockerised environment. It can help to figure out what configurations are
>needed to deploy a cluster.
>
>The Impala application depends on the gflags package, so Impala can be
>configured with a flagfile or with CLI arguments, some examples can be
>found here:
>https://impala.apache.org/docs/build/html/topics/impala_config_options.html
>This shows examples on how to point the Impala services to check for each
>other on the right hosts.
>
>Additionally, Impala depends on multiple configuration files to work with
>other Hadoop components, for example the HDFSClient inside Impala will need
>core-site.xml. In a working dev environment you can find these under
>'${IMPALA_HOME}/fe/src/test/resources'. For example, if you check the
>core-site.xml you will see that the defaultFS config points to a local HDFS:
>      <property>
>        <name>fs.defaultFS</name>
>        <value>hdfs://localhost:20500</value>
>      </property>
>You will need to create these config files with the right Hadoop cluster
>configurations on every node.
>
>Hope this helps, let me know if you get stuck during the process.
>
>Best regards,
>Tamas
>
>
>On Mon, 24 Apr 2023 at 09:15, Omkar Rohadkar <om...@ellicium.com>
>wrote:
>
>> Hi Team,
>>
>> I've successfully done installation of open source impala and integrated
>> Apache services like hadoop ,hive, hbase on a single node . This setup is
>> working fine.
>>
>> But now I want to install  Impala in a multi-node environment and integrate
>> it with Apache services. I have a few questions about it.
>>
>> When we build and install Impala and start impala service using start
>> impala cluster.py file it starts all the impala services like 1 catalogd 1
>> statestore and 3 daemons.
>>
>> But now in multi node setup we only want catalog and statestore to run on
>> master node and one impala daemon on all the worker nodes.
>>
>> May I know how we can achieve this? Do we need to build it differently? and
>> in which config files or script files we need to change so that we can
>> achieve the above setup.
>>
>> For installation of multi node setup if we build it on every node how can
>> we combine it as a one cluster?
>>
>> Hope to get your help on the above doubts.
>>
>> Thanks and regards,
>> Omkar R
>>
>> --
>> Privileged/Confidential information may be contained in this message and
>> may be subject to legal privilege. Access to this e-mail by anyone other
>> than the intended is unauthorized. If you are not the intended recipient
>> (or responsible for delivery of the message to such person), you may not
>> use, copy, distribute or deliver to anyone this message (or any part of
>> its
>> contents ) or take any action in reliance on it. In such case, you should
>> destroy this message, and notify us immediately. If you have received this
>> email in error, please notify us immediately by e-mail or telephone and
>> delete the e-mail from any computer. If you or your employer does not
>> consent to internet e-mail messages of this kind, please notify us
>> immediately by e-mail. All reasonable precautions have been taken to
>> ensure
>> no viruses are present in this e-mail. As our company cannot accept
>> responsibility for any loss or damage arising from the use of this e-mail
>> or attachments we recommend that you subject these to your virus checking
>> procedures prior to use. The views, opinions, conclusions and other
>> information expressed in this electronic mail are not given or endorsed by
>> the company unless otherwise indicated by an authorized representative
>> independent of this message.
>>

Re: Impala multinode installtion

Posted by Tamás Máté <tm...@apache.org>.
Hi Omkar R,

Thanks for reaching out to us. My understanding is that you are looking for
a way to deploy/start an Impala cluster in a multi node environment.

At this point Impala does not have a solution to deploy/start a cluster in
to/on multiple hosts. The start-impala-cluster.py script starts and
configures it on a single hosts, or it can start the cluster in a
dockerised environment. It can help to figure out what configurations are
needed to deploy a cluster.

The Impala application depends on the gflags package, so Impala can be
configured with a flagfile or with CLI arguments, some examples can be
found here:
https://impala.apache.org/docs/build/html/topics/impala_config_options.html
This shows examples on how to point the Impala services to check for each
other on the right hosts.

Additionally, Impala depends on multiple configuration files to work with
other Hadoop components, for example the HDFSClient inside Impala will need
core-site.xml. In a working dev environment you can find these under
'${IMPALA_HOME}/fe/src/test/resources'. For example, if you check the
core-site.xml you will see that the defaultFS config points to a local HDFS:
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:20500</value>
      </property>
You will need to create these config files with the right Hadoop cluster
configurations on every node.

Hope this helps, let me know if you get stuck during the process.

Best regards,
Tamas


On Mon, 24 Apr 2023 at 09:15, Omkar Rohadkar <om...@ellicium.com>
wrote:

> Hi Team,
>
> I've successfully done installation of open source impala and integrated
> Apache services like hadoop ,hive, hbase on a single node . This setup is
> working fine.
>
> But now I want to install  Impala in a multi-node environment and integrate
> it with Apache services. I have a few questions about it.
>
> When we build and install Impala and start impala service using start
> impala cluster.py file it starts all the impala services like 1 catalogd 1
> statestore and 3 daemons.
>
> But now in multi node setup we only want catalog and statestore to run on
> master node and one impala daemon on all the worker nodes.
>
> May I know how we can achieve this? Do we need to build it differently? and
> in which config files or script files we need to change so that we can
> achieve the above setup.
>
> For installation of multi node setup if we build it on every node how can
> we combine it as a one cluster?
>
> Hope to get your help on the above doubts.
>
> Thanks and regards,
> Omkar R
>
> --
> Privileged/Confidential information may be contained in this message and
> may be subject to legal privilege. Access to this e-mail by anyone other
> than the intended is unauthorized. If you are not the intended recipient
> (or responsible for delivery of the message to such person), you may not
> use, copy, distribute or deliver to anyone this message (or any part of
> its
> contents ) or take any action in reliance on it. In such case, you should
> destroy this message, and notify us immediately. If you have received this
> email in error, please notify us immediately by e-mail or telephone and
> delete the e-mail from any computer. If you or your employer does not
> consent to internet e-mail messages of this kind, please notify us
> immediately by e-mail. All reasonable precautions have been taken to
> ensure
> no viruses are present in this e-mail. As our company cannot accept
> responsibility for any loss or damage arising from the use of this e-mail
> or attachments we recommend that you subject these to your virus checking
> procedures prior to use. The views, opinions, conclusions and other
> information expressed in this electronic mail are not given or endorsed by
> the company unless otherwise indicated by an authorized representative
> independent of this message.
>