You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Aleksei Maželis <ol...@gmail.com> on 2017/04/25 05:28:04 UTC

Impala-Kudu - minimal set of dependencies

Hi,

I am looking for the minimal set of dependencies that an Impala-Kudu setup
will have. When reading how-tos and checking available dockerfiles, the
list of items that Impala depends on seems to include at least:
- hadoop-hdfs-namenode
- hadoop-hdfs-datanode
- hive-metastore.

However, if I understand correctly, when Kudu is used along with Impala,
these aren't necessarily used. So, the question is if the above
dependencies of HDFS and Hive can be avoided, and if so, whether there are
side effects this would bring. And of course, if I am missing some other
mandatory dependencies, hints on those would be more then welcome!

With best regards,
Alex

Re: Impala-Kudu - minimal set of dependencies

Posted by Matthew Jacobs <mj...@cloudera.com>.
Hi Aleksei,

I don't think there's enough information here to help diagnose
further. Can you share more of the log?

FWIW we don't test this scenario so it very well might not work.

-Matt

On Thu, Apr 27, 2017 at 4:20 AM, Aleksei Maželis <ol...@gmail.com> wrote:
> It seems that Impala indeed starts without HDFS when the flag is reset as
> suggested. I'm also observing some errors in the logs, due to the inability
> to find/connect to HDFS:
>
> Could not read the root directory at hdfs://0.0.0.0:8020.
> Error was: Call From localhost/ZZ.ZZZ.ZZ.ZZZ to 0.0.0.0:8020 failed on
> connection exception: java.net.ConnectException: Connection refused; For
> more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>
> This is likely due to the fact that HDFS is still mentioned in the configs
> (specifically, core-site.xml still refers to fs.default.name as to
> hdfs://0.0.0.0:8020). Should this be modified somehow to refer to the local
> file system or removed altogether?
>
> Regards,
> Alex
>
> On Thu, Apr 27, 2017 at 8:53 AM, Aleksei Maželis <ol...@gmail.com> wrote:
>>
>> Ok, great, thanks for elaborating! I will give it a try with the
>> abort_on_config_error flag reset!
>>
>> As to the Hive dependency, I will stay tuned for the future updates then.
>>
>> Regards,
>> Alex
>>
>> On Wed, Apr 26, 2017 at 6:31 PM, Alexander Behm <al...@cloudera.com>
>> wrote:
>>>
>>> You should be able to bring Impala up without HDFS by passing the
>>> "--abort_on_config_error=false" startup flag.
>>>
>>> Just to clarify:
>>> You still need the Hive Metastore with Kudu because table and column
>>> statistics are stored in the Metastore; our plan is to eventually remove
>>> this dependency
>>>
>>> On Wed, Apr 26, 2017 at 1:28 AM, Aleksei Maželis <ol...@gmail.com>
>>> wrote:
>>>>
>>>> Ok, I see. I remember trying to run Impala-Kudu without HDFS, and
>>>> remember that Impala failed to start. But perhaps there is a way to make
>>>> Impala work without HDFS after re-configuring it properly; any hints on
>>>> which configurations to change would be very helpful!
>>>>
>>>> BR,
>>>> Alex
>>>>
>>>> On Tue, Apr 25, 2017 at 7:36 PM, Jim Apple <jb...@cloudera.com> wrote:
>>>>>
>>>>> I believe the Hive Metastore is needed even when Kudu is the storage
>>>>> engine. I don't know if the HDFS namenode and datanodes are needed.
>>>>>
>>>>> On Mon, Apr 24, 2017 at 10:28 PM, Aleksei Maželis <ol...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am looking for the minimal set of dependencies that an Impala-Kudu
>>>>>> setup will have. When reading how-tos and checking available dockerfiles,
>>>>>> the list of items that Impala depends on seems to include at least:
>>>>>> - hadoop-hdfs-namenode
>>>>>> - hadoop-hdfs-datanode
>>>>>> - hive-metastore.
>>>>>>
>>>>>> However, if I understand correctly, when Kudu is used along with
>>>>>> Impala, these aren't necessarily used. So, the question is if the above
>>>>>> dependencies of HDFS and Hive can be avoided, and if so, whether there are
>>>>>> side effects this would bring. And of course, if I am missing some other
>>>>>> mandatory dependencies, hints on those would be more then welcome!
>>>>>>
>>>>>> With best regards,
>>>>>> Alex
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Impala-Kudu - minimal set of dependencies

Posted by Aleksei Maželis <ol...@gmail.com>.
It seems that Impala indeed starts without HDFS when the flag is reset as
suggested. I'm also observing some errors in the logs, due to the inability
to find/connect to HDFS:

Could not read the root directory at hdfs://0.0.0.0:8020.
Error was: Call From localhost/ZZ.ZZZ.ZZ.ZZZ to 0.0.0.0:8020 failed on
connection exception: java.net.ConnectException: Connection refused; For
more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

This is likely due to the fact that HDFS is still mentioned in the configs
(specifically, core-site.xml still refers to fs.default.name as to hdfs://
0.0.0.0:8020). Should this be modified somehow to refer to the local file
system or removed altogether?

Regards,
Alex

On Thu, Apr 27, 2017 at 8:53 AM, Aleksei Maželis <ol...@gmail.com> wrote:

> Ok, great, thanks for elaborating! I will give it a try with the abort_on_config_error
> flag reset!
>
> As to the Hive dependency, I will stay tuned for the future updates then.
>
> Regards,
> Alex
>
> On Wed, Apr 26, 2017 at 6:31 PM, Alexander Behm <al...@cloudera.com>
> wrote:
>
>> You should be able to bring Impala up without HDFS by passing the
>> "--abort_on_config_error=false" startup flag.
>>
>> Just to clarify:
>> You still need the Hive Metastore with Kudu because table and column
>> statistics are stored in the Metastore; our plan is to eventually remove
>> this dependency
>>
>> On Wed, Apr 26, 2017 at 1:28 AM, Aleksei Maželis <ol...@gmail.com>
>> wrote:
>>
>>> Ok, I see. I remember trying to run Impala-Kudu without HDFS, and
>>> remember that Impala failed to start. But perhaps there is a way to make
>>> Impala work without HDFS after re-configuring it properly; any hints on
>>> which configurations to change would be very helpful!
>>>
>>> BR,
>>> Alex
>>>
>>> On Tue, Apr 25, 2017 at 7:36 PM, Jim Apple <jb...@cloudera.com> wrote:
>>>
>>>> I believe the Hive Metastore is needed even when Kudu is the storage
>>>> engine. I don't know if the HDFS namenode and datanodes are needed.
>>>>
>>>> On Mon, Apr 24, 2017 at 10:28 PM, Aleksei Maželis <ol...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am looking for the minimal set of dependencies that an Impala-Kudu
>>>>> setup will have. When reading how-tos and checking available dockerfiles,
>>>>> the list of items that Impala depends on seems to include at least:
>>>>> - hadoop-hdfs-namenode
>>>>> - hadoop-hdfs-datanode
>>>>> - hive-metastore.
>>>>>
>>>>> However, if I understand correctly, when Kudu is used along with
>>>>> Impala, these aren't necessarily used. So, the question is if the above
>>>>> dependencies of HDFS and Hive can be avoided, and if so, whether there are
>>>>> side effects this would bring. And of course, if I am missing some other
>>>>> mandatory dependencies, hints on those would be more then welcome!
>>>>>
>>>>> With best regards,
>>>>> Alex
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Impala-Kudu - minimal set of dependencies

Posted by Aleksei Maželis <ol...@gmail.com>.
Ok, great, thanks for elaborating! I will give it a try with the
abort_on_config_error
flag reset!

As to the Hive dependency, I will stay tuned for the future updates then.

Regards,
Alex

On Wed, Apr 26, 2017 at 6:31 PM, Alexander Behm <al...@cloudera.com>
wrote:

> You should be able to bring Impala up without HDFS by passing the
> "--abort_on_config_error=false" startup flag.
>
> Just to clarify:
> You still need the Hive Metastore with Kudu because table and column
> statistics are stored in the Metastore; our plan is to eventually remove
> this dependency
>
> On Wed, Apr 26, 2017 at 1:28 AM, Aleksei Maželis <ol...@gmail.com> wrote:
>
>> Ok, I see. I remember trying to run Impala-Kudu without HDFS, and
>> remember that Impala failed to start. But perhaps there is a way to make
>> Impala work without HDFS after re-configuring it properly; any hints on
>> which configurations to change would be very helpful!
>>
>> BR,
>> Alex
>>
>> On Tue, Apr 25, 2017 at 7:36 PM, Jim Apple <jb...@cloudera.com> wrote:
>>
>>> I believe the Hive Metastore is needed even when Kudu is the storage
>>> engine. I don't know if the HDFS namenode and datanodes are needed.
>>>
>>> On Mon, Apr 24, 2017 at 10:28 PM, Aleksei Maželis <ol...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am looking for the minimal set of dependencies that an Impala-Kudu
>>>> setup will have. When reading how-tos and checking available dockerfiles,
>>>> the list of items that Impala depends on seems to include at least:
>>>> - hadoop-hdfs-namenode
>>>> - hadoop-hdfs-datanode
>>>> - hive-metastore.
>>>>
>>>> However, if I understand correctly, when Kudu is used along with
>>>> Impala, these aren't necessarily used. So, the question is if the above
>>>> dependencies of HDFS and Hive can be avoided, and if so, whether there are
>>>> side effects this would bring. And of course, if I am missing some other
>>>> mandatory dependencies, hints on those would be more then welcome!
>>>>
>>>> With best regards,
>>>> Alex
>>>>
>>>
>>>
>>
>

Re: Impala-Kudu - minimal set of dependencies

Posted by Alexander Behm <al...@cloudera.com>.
You should be able to bring Impala up without HDFS by passing the
"--abort_on_config_error=false" startup flag.

Just to clarify:
You still need the Hive Metastore with Kudu because table and column
statistics are stored in the Metastore; our plan is to eventually remove
this dependency

On Wed, Apr 26, 2017 at 1:28 AM, Aleksei Maželis <ol...@gmail.com> wrote:

> Ok, I see. I remember trying to run Impala-Kudu without HDFS, and remember
> that Impala failed to start. But perhaps there is a way to make Impala work
> without HDFS after re-configuring it properly; any hints on which
> configurations to change would be very helpful!
>
> BR,
> Alex
>
> On Tue, Apr 25, 2017 at 7:36 PM, Jim Apple <jb...@cloudera.com> wrote:
>
>> I believe the Hive Metastore is needed even when Kudu is the storage
>> engine. I don't know if the HDFS namenode and datanodes are needed.
>>
>> On Mon, Apr 24, 2017 at 10:28 PM, Aleksei Maželis <ol...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am looking for the minimal set of dependencies that an Impala-Kudu
>>> setup will have. When reading how-tos and checking available dockerfiles,
>>> the list of items that Impala depends on seems to include at least:
>>> - hadoop-hdfs-namenode
>>> - hadoop-hdfs-datanode
>>> - hive-metastore.
>>>
>>> However, if I understand correctly, when Kudu is used along with Impala,
>>> these aren't necessarily used. So, the question is if the above
>>> dependencies of HDFS and Hive can be avoided, and if so, whether there are
>>> side effects this would bring. And of course, if I am missing some other
>>> mandatory dependencies, hints on those would be more then welcome!
>>>
>>> With best regards,
>>> Alex
>>>
>>
>>
>

Re: Impala-Kudu - minimal set of dependencies

Posted by Aleksei Maželis <ol...@gmail.com>.
Ok, I see. I remember trying to run Impala-Kudu without HDFS, and remember
that Impala failed to start. But perhaps there is a way to make Impala work
without HDFS after re-configuring it properly; any hints on which
configurations to change would be very helpful!

BR,
Alex

On Tue, Apr 25, 2017 at 7:36 PM, Jim Apple <jb...@cloudera.com> wrote:

> I believe the Hive Metastore is needed even when Kudu is the storage
> engine. I don't know if the HDFS namenode and datanodes are needed.
>
> On Mon, Apr 24, 2017 at 10:28 PM, Aleksei Maželis <ol...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am looking for the minimal set of dependencies that an Impala-Kudu
>> setup will have. When reading how-tos and checking available dockerfiles,
>> the list of items that Impala depends on seems to include at least:
>> - hadoop-hdfs-namenode
>> - hadoop-hdfs-datanode
>> - hive-metastore.
>>
>> However, if I understand correctly, when Kudu is used along with Impala,
>> these aren't necessarily used. So, the question is if the above
>> dependencies of HDFS and Hive can be avoided, and if so, whether there are
>> side effects this would bring. And of course, if I am missing some other
>> mandatory dependencies, hints on those would be more then welcome!
>>
>> With best regards,
>> Alex
>>
>
>

Re: Impala-Kudu - minimal set of dependencies

Posted by Jim Apple <jb...@cloudera.com>.
I believe the Hive Metastore is needed even when Kudu is the storage
engine. I don't know if the HDFS namenode and datanodes are needed.

On Mon, Apr 24, 2017 at 10:28 PM, Aleksei Maželis <ol...@gmail.com> wrote:

> Hi,
>
> I am looking for the minimal set of dependencies that an Impala-Kudu setup
> will have. When reading how-tos and checking available dockerfiles, the
> list of items that Impala depends on seems to include at least:
> - hadoop-hdfs-namenode
> - hadoop-hdfs-datanode
> - hive-metastore.
>
> However, if I understand correctly, when Kudu is used along with Impala,
> these aren't necessarily used. So, the question is if the above
> dependencies of HDFS and Hive can be avoided, and if so, whether there are
> side effects this would bring. And of course, if I am missing some other
> mandatory dependencies, hints on those would be more then welcome!
>
> With best regards,
> Alex
>