You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Bharath Vissapragada <bh...@cloudera.com> on 2017/09/19 16:20:23 UTC

Re: FW: Письмо о проблемах Impala and IGFS.

In Impala's context, disk-ID corresponds to the ID of a local disk (on a
data node) hosting a particular block replica of a given file. I'm not
familiar with the internals of IGFS but from a quick read [1], it looks
like an in-memory FS. So, I don't think the idea of "disk ID" makes sense.

To fix this, I think we need to make some Impala side changes to ignore
loading disk IDs in such cases (patches are welcome :)).

FWIW, we did somewhat similar things while integrating S3/ADLS filesystems
where there is no concept of block replicas and we just systhesized dummy
metadata based on file range splits [2].

[1] https://ignite.apache.org/features/igfs.html
[2]
https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L292

On Tue, Sep 19, 2017 at 4:28 AM, Andrey Kuznetsov <Andrey_Kuznetsov@epam.com
> wrote:

> Hi folk,
> We have a problem with integration Impala and IGFS.  Select from tables on
> IGFS causes a warning:
>
> WARNINGS: Unknown disk id.  This will negatively affect performance.
> Check your hdfs settings to enable block location metadata. (1 of 2
> similar).
>
> Is this problem of IGFS? Can we enable <block location metadata> on IGFS?
>
> Best regards,
> ANDREY KUZNETSOV
> Software Engineering Team Leader
>
> Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>
>  Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email:
> andrey_kuznetsov@epam.com<ma...@epam.com>
> Tver, Russia   epam.com<http://www.epam.com/>
>
> CONFIDENTIALITY CAUTION AND DISCLAIMER
> This message is intended only for the use of the individual(s) or
> entity(ies) to which it is addressed and contains information that is
> legally privileged and confidential. If you are not the intended recipient,
> or the person responsible for delivering the message to the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. All unintended
> recipients are obliged to delete this message and destroy any printed
> copies.
>

RE: FW: Письмо о проблемах Impala and IGFS.

Posted by Andrey Kuznetsov <An...@epam.com>.
Hi,
So, it seems is not our problem (we have a problem with performance, it looks like IGFS load network too actively),
Thank you for your answer,

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader, Assessment Global Discipline Head (Java)

Office: +7 482 263 00 70 x 42766<tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7 920 154 05 72<tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<ma...@epam.com>
Tver, Russia   epam.com<http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies.

From: Bharath Vissapragada [mailto:bharathv@cloudera.com]
Sent: Tuesday, September 19, 2017 7:20 PM
To: dev@impala <de...@impala.incubator.apache.org>
Cc: Special SBER-BPOC Team <Sp...@epam.com>
Subject: Re: FW: Письмо о проблемах Impala and IGFS.

In Impala's context, disk-ID corresponds to the ID of a local disk (on a data node) hosting a particular block replica of a given file. I'm not familiar with the internals of IGFS but from a quick read [1], it looks like an in-memory FS. So, I don't think the idea of "disk ID" makes sense.

To fix this, I think we need to make some Impala side changes to ignore loading disk IDs in such cases (patches are welcome :)).

FWIW, we did somewhat similar things while integrating S3/ADLS filesystems where there is no concept of block replicas and we just systhesized dummy metadata based on file range splits [2].

[1] https://ignite.apache.org/features/igfs.html
[2] https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L292

On Tue, Sep 19, 2017 at 4:28 AM, Andrey Kuznetsov <An...@epam.com>> wrote:
Hi folk,
We have a problem with integration Impala and IGFS.  Select from tables on IGFS causes a warning:

WARNINGS: Unknown disk id.  This will negatively affect performance.
Check your hdfs settings to enable block location metadata. (1 of 2 similar).

Is this problem of IGFS? Can we enable <block location metadata> on IGFS?

Best regards,
ANDREY KUZNETSOV
Software Engineering Team Leader

Office: +7 482 263 00 70 x 42766<tel:%2B7%20482%20263%2000%2070%20x%2042766><tel:+7%20482%20263%2000%2070;ext=42766>   Cell: +7 920 154 05 72<tel:%2B7%20920%20154%2005%2072><tel:+7%20920%20154%2005%2072>   Email: andrey_kuznetsov@epam.com<ma...@epam.com>>
Tver, Russia   epam.com<http://epam.com><http://www.epam.com/>

CONFIDENTIALITY CAUTION AND DISCLAIMER
This message is intended only for the use of the individual(s) or entity(ies) to which it is addressed and contains information that is legally privileged and confidential. If you are not the intended recipient, or the person responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. All unintended recipients are obliged to delete this message and destroy any printed copies.