You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Geetika Gupta <ge...@knoldus.in> on 2018/04/25 06:12:26 UTC

Problem in Querying data on Impala Cluster

Hello Community,

We were trying to query parquet data stored in hdfs through impala cluster.
but when we execute our query it shows the following error in the *impalad*
logs of the machine:

*W0424 18:32:40.928611  7655 negotiation.cc:306] Unauthorized connection
attempt: Server connection negotiation failed: server connection from
<node_ip_of_different_machine>:43448: unencrypted connections from publicly
routable IPs are prohibited. See --trusted_subnets flag for more
information.: <node_ip_of_different_machine>:43448*

We are encountering this problem only when we have multiple nodes in
impala. It works fine on single machine.



-- 
Regards,
Geetika Gupta

Re: Problem in Querying data on Impala Cluster

Posted by Geetika Gupta <ge...@knoldus.in>.
Hi Community,

We are getting the following logs in the query details tab of the executor
UI

*Status: File
'hdfs://hadoop-master:54311/opt/spark-2.1.0-bin-hadoop2.7/spark-warehouse/parquet500.db/customer/part-00010-12db8b7a-02ae-463f-8af6-667e62dca833.snappy.parquet'
has an incompatible Parquet schema for column
'parquet500.customer.c_acctbal'. Column type: DECIMAL(15,2), Parquet
schema: optional int64 C_ACCTBAL [i:5 d:1 r:0]*

We created this table through Spark using parquet as file format.




On Wed, Apr 25, 2018 at 5:32 PM, Geetika Gupta <ge...@knoldus.in>
wrote:

> Thanks Sailesh Mukil.
>
> It resolved the issue.
> But now we are encountering some other error:
>
> *I0425 17:20:35.209463  7481 Frontend.java:987] Analyzing query: select
> c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue,
> c_acctbal, n_name, c_address, c_phone, c_comment from lineitem, orders,
> customer, nation where o_custkey=c_custkey and l_orderkey = o_orderkey and
> c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone,
> n_name, c_address, c_comment order by revenue desc limit 20*
> *I0425 17:20:35.214584  7481 Frontend.java:999] Analysis finished.*
> *I0425 17:20:36.308864  1333 coordinator.cc:783] Release admission control
> resources for query_id=73443760aff6bec9:4f2fcc1e00000000*
> *I0425 17:20:36.313413 10127 query-state.cc:288] Cancelling fragment
> instances as directed by the coordinator. Returned status:
> ReportExecStatus(): Received report for unknown query ID (probably closed
> or cancelled): 73443760aff6bec9:4f2fcc1e00000000*
>
> These are the logs from the *impalad* process on that machine. We are
> encountering this error only for some of the queries.
>
>
> On Wed, Apr 25, 2018 at 11:59 AM, Sailesh Mukil <sa...@cloudera.com>
> wrote:
>
>> Hi Geetika,
>>
>> It looks like you're using unencrypted connections that don't fall under
>> the local subnet or a private network, which means you're potentially
>> trying to send unencrypted data over a public network between nodes.
>>
>> We explicitly disallow these kinds of connections by default. However, if
>> you still feel like you want to go ahead with this configuration, or that
>> the above explanation is a mistake, this might help you:
>> https://github.com/apache/impala/blob/6f2ebadf8d119b1486f54b
>> 911ba3c7ecc1921d55/be/src/kudu/rpc/server_negotiation.cc#L70-L80
>>
>> You can set the 'trusted_subnet' startup flag to whitelist the subnet
>> that your impala nodes' IP addresses fall under.
>>
>> I hope this helps.
>>
>> - Sailesh
>>
>> On Tue, Apr 24, 2018 at 11:12 PM, Geetika Gupta <geetika.gupta@knoldus.in
>> > wrote:
>>
>>> Hello Community,
>>>
>>> We were trying to query parquet data stored in hdfs through impala
>>> cluster. but when we execute our query it shows the following error in the
>>> *impalad* logs of the machine:
>>>
>>> *W0424 18:32:40.928611  7655 negotiation.cc:306] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> <node_ip_of_different_machine>:43448: unencrypted connections from publicly
>>> routable IPs are prohibited. See --trusted_subnets flag for more
>>> information.: <node_ip_of_different_machine>:43448*
>>>
>>> We are encountering this problem only when we have multiple nodes in
>>> impala. It works fine on single machine.
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Geetika Gupta
>>>
>>
>>
>
>
> --
> Regards,
> Geetika Gupta
>



-- 
Regards,
Geetika Gupta

Re: Problem in Querying data on Impala Cluster

Posted by Sailesh Mukil <sa...@cloudera.com>.
On Wed, Apr 25, 2018 at 5:02 AM, Geetika Gupta <ge...@knoldus.in>
wrote:

> Thanks Sailesh Mukil.
>
> It resolved the issue.
> But now we are encountering some other error:
>
> *I0425 17:20:35.209463  7481 Frontend.java:987] Analyzing query: select
> c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue,
> c_acctbal, n_name, c_address, c_phone, c_comment from lineitem, orders,
> customer, nation where o_custkey=c_custkey and l_orderkey = o_orderkey and
> c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone,
> n_name, c_address, c_comment order by revenue desc limit 20*
> *I0425 17:20:35.214584  7481 Frontend.java:999] Analysis finished.*
> *I0425 17:20:36.308864  1333 coordinator.cc:783] Release admission control
> resources for query_id=73443760aff6bec9:4f2fcc1e00000000*
> *I0425 17:20:36.313413 10127 query-state.cc:288] Cancelling fragment
> instances as directed by the coordinator. Returned status:
> ReportExecStatus(): Received report for unknown query ID (probably closed
> or cancelled): 73443760aff6bec9:4f2fcc1e00000000*
>
> These are the logs from the *impalad* process on that machine. We are
> encountering this error only for some of the queries.
>
>
This error is just stating that the coordinator has cancelled the query for
some reason. There must be some other log message in the coordinator node
for that query stating the actual reason for the failure of the query. I'm
not sure about the Parquet schema mismatch failure but someone else might
get back to you regarding that.


>
> On Wed, Apr 25, 2018 at 11:59 AM, Sailesh Mukil <sa...@cloudera.com>
> wrote:
>
>> Hi Geetika,
>>
>> It looks like you're using unencrypted connections that don't fall under
>> the local subnet or a private network, which means you're potentially
>> trying to send unencrypted data over a public network between nodes.
>>
>> We explicitly disallow these kinds of connections by default. However, if
>> you still feel like you want to go ahead with this configuration, or that
>> the above explanation is a mistake, this might help you:
>> https://github.com/apache/impala/blob/6f2ebadf8d119b1486f54b
>> 911ba3c7ecc1921d55/be/src/kudu/rpc/server_negotiation.cc#L70-L80
>>
>> You can set the 'trusted_subnet' startup flag to whitelist the subnet
>> that your impala nodes' IP addresses fall under.
>>
>> I hope this helps.
>>
>> - Sailesh
>>
>> On Tue, Apr 24, 2018 at 11:12 PM, Geetika Gupta <geetika.gupta@knoldus.in
>> > wrote:
>>
>>> Hello Community,
>>>
>>> We were trying to query parquet data stored in hdfs through impala
>>> cluster. but when we execute our query it shows the following error in the
>>> *impalad* logs of the machine:
>>>
>>> *W0424 18:32:40.928611  7655 negotiation.cc:306] Unauthorized connection
>>> attempt: Server connection negotiation failed: server connection from
>>> <node_ip_of_different_machine>:43448: unencrypted connections from publicly
>>> routable IPs are prohibited. See --trusted_subnets flag for more
>>> information.: <node_ip_of_different_machine>:43448*
>>>
>>> We are encountering this problem only when we have multiple nodes in
>>> impala. It works fine on single machine.
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Geetika Gupta
>>>
>>
>>
>
>
> --
> Regards,
> Geetika Gupta
>

Re: Problem in Querying data on Impala Cluster

Posted by Geetika Gupta <ge...@knoldus.in>.
Thanks Sailesh Mukil.

It resolved the issue.
But now we are encountering some other error:

*I0425 17:20:35.209463  7481 Frontend.java:987] Analyzing query: select
c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue,
c_acctbal, n_name, c_address, c_phone, c_comment from lineitem, orders,
customer, nation where o_custkey=c_custkey and l_orderkey = o_orderkey and
c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone,
n_name, c_address, c_comment order by revenue desc limit 20*
*I0425 17:20:35.214584  7481 Frontend.java:999] Analysis finished.*
*I0425 17:20:36.308864  1333 coordinator.cc:783] Release admission control
resources for query_id=73443760aff6bec9:4f2fcc1e00000000*
*I0425 17:20:36.313413 10127 query-state.cc:288] Cancelling fragment
instances as directed by the coordinator. Returned status:
ReportExecStatus(): Received report for unknown query ID (probably closed
or cancelled): 73443760aff6bec9:4f2fcc1e00000000*

These are the logs from the *impalad* process on that machine. We are
encountering this error only for some of the queries.


On Wed, Apr 25, 2018 at 11:59 AM, Sailesh Mukil <sa...@cloudera.com>
wrote:

> Hi Geetika,
>
> It looks like you're using unencrypted connections that don't fall under
> the local subnet or a private network, which means you're potentially
> trying to send unencrypted data over a public network between nodes.
>
> We explicitly disallow these kinds of connections by default. However, if
> you still feel like you want to go ahead with this configuration, or that
> the above explanation is a mistake, this might help you:
> https://github.com/apache/impala/blob/6f2ebadf8d119b1486f54b911ba3c7
> ecc1921d55/be/src/kudu/rpc/server_negotiation.cc#L70-L80
>
> You can set the 'trusted_subnet' startup flag to whitelist the subnet that
> your impala nodes' IP addresses fall under.
>
> I hope this helps.
>
> - Sailesh
>
> On Tue, Apr 24, 2018 at 11:12 PM, Geetika Gupta <ge...@knoldus.in>
> wrote:
>
>> Hello Community,
>>
>> We were trying to query parquet data stored in hdfs through impala
>> cluster. but when we execute our query it shows the following error in the
>> *impalad* logs of the machine:
>>
>> *W0424 18:32:40.928611  7655 negotiation.cc:306] Unauthorized connection
>> attempt: Server connection negotiation failed: server connection from
>> <node_ip_of_different_machine>:43448: unencrypted connections from publicly
>> routable IPs are prohibited. See --trusted_subnets flag for more
>> information.: <node_ip_of_different_machine>:43448*
>>
>> We are encountering this problem only when we have multiple nodes in
>> impala. It works fine on single machine.
>>
>>
>>
>> --
>> Regards,
>> Geetika Gupta
>>
>
>


-- 
Regards,
Geetika Gupta

Re: Problem in Querying data on Impala Cluster

Posted by Sailesh Mukil <sa...@cloudera.com>.
Hi Geetika,

It looks like you're using unencrypted connections that don't fall under
the local subnet or a private network, which means you're potentially
trying to send unencrypted data over a public network between nodes.

We explicitly disallow these kinds of connections by default. However, if
you still feel like you want to go ahead with this configuration, or that
the above explanation is a mistake, this might help you:
https://github.com/apache/impala/blob/6f2ebadf8d119b1486f54b911ba3c7ecc1921d55/be/src/kudu/rpc/server_negotiation.cc#L70-L80

You can set the 'trusted_subnet' startup flag to whitelist the subnet that
your impala nodes' IP addresses fall under.

I hope this helps.

- Sailesh

On Tue, Apr 24, 2018 at 11:12 PM, Geetika Gupta <ge...@knoldus.in>
wrote:

> Hello Community,
>
> We were trying to query parquet data stored in hdfs through impala
> cluster. but when we execute our query it shows the following error in the
> *impalad* logs of the machine:
>
> *W0424 18:32:40.928611  7655 negotiation.cc:306] Unauthorized connection
> attempt: Server connection negotiation failed: server connection from
> <node_ip_of_different_machine>:43448: unencrypted connections from publicly
> routable IPs are prohibited. See --trusted_subnets flag for more
> information.: <node_ip_of_different_machine>:43448*
>
> We are encountering this problem only when we have multiple nodes in
> impala. It works fine on single machine.
>
>
>
> --
> Regards,
> Geetika Gupta
>