You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Attila Bukor (Jira)" <ji...@apache.org> on 2020/07/01 23:22:00 UTC

[jira] [Commented] (KUDU-3154) RangerClientTestBase.TestLogging sometimes fails

    [ https://issues.apache.org/jira/browse/KUDU-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149740#comment-17149740 ] 

Attila Bukor commented on KUDU-3154:
------------------------------------

Hm this is very interesting. I've seen this happen again in a test. A response is never constructed, which [would be logged|[https://github.com/apache/kudu/blob/master/java/kudu-subprocess/src/main/java/org/apache/kudu/subprocess/ranger/RangerProtocolHandler.java#L62|https://github.com/apache/kudu/blob/d23ee5d38ddc4317f431dd65df0c825c00cc968a/java/kudu-subprocess/src/main/java/org/apache/kudu/subprocess/ranger/RangerProtocolHandler.java#L62]], so I guess it is possible that it never gets evaluated, but I think it's unlikely that this happens due to the high frequency of the requests.

It's configured to refresh policies every [200ms|[https://github.com/apache/kudu/blob/master/src/kudu/ranger/mini_ranger.h#L193]] and from the logs we can see that it does indeed get refreshed every 200ms + epsilon and takes ~30ms so even if there's a lock shared by the authorizer and the refresher the authorizer should have plenty of time to acquire.
{code:java}
2020-06-20 07:05:49.672 [DEBUG - PolicyRefresher(serviceName=kudu)-12] (PolicyRefresher.java:489) ==> PolicyRefresher(serviceName=kudu).loadRoles()
2020-06-20 07:05:49.713 [DEBUG - PolicyRefresher(serviceName=kudu)-12] (PolicyRefresher.java:287) <== PolicyRefresher(serviceName=kudu).loadPolicy()
2020-06-20 07:05:49.872 [DEBUG - PolicyRefresher(serviceName=kudu)-12] (PolicyRefresher.java:489) ==> PolicyRefresher(serviceName=kudu).loadRoles()
2020-06-20 07:05:49.896 [DEBUG - PolicyRefresher(serviceName=kudu)-12] (PolicyRefresher.java:287) <== PolicyRefresher(serviceName=kudu).loadPolicy()
2020-06-20 07:05:50.072 [DEBUG - PolicyRefresher(serviceName=kudu)-12] (PolicyRefresher.java:489) ==> PolicyRefresher(serviceName=kudu).loadRoles()
2020-06-20 07:05:50.292 [DEBUG - PolicyRefresher(serviceName=kudu)-12] (PolicyRefresher.java:287) <== PolicyRefresher(serviceName=kudu).loadPolicy()
2020-06-20 07:05:50.473 [DEBUG - PolicyRefresher(serviceName=kudu)-12] (PolicyRefresher.java:489) ==> PolicyRefresher(serviceName=kudu).loadRoles()
{code}

Tbh I'm not sure how that InboundRequest is logged to be taken before it's put but I find it very weird. It was also logged the same way in the failing test that I've seen.

If I run this test on my Mac it succeeds and the QueueUtil messages are in the expected/correct order:

{code}
2020-07-02 01:17:19.459 [DEBUG - pool-2-thread-1] (QueueUtil.java:56) Message: org.apache.kudu.subprocess.InboundRequest@1cf099ad has been put on the queue
2020-07-02 01:17:19.459 [DEBUG - pool-3-thread-3] (QueueUtil.java:41) Message: org.apache.kudu.subprocess.InboundRequest@1cf099ad has been taken from the queue
{code}

I tried to search for some blocking queue bugs but couldn't find any so far.

> RangerClientTestBase.TestLogging sometimes fails
> ------------------------------------------------
>
>                 Key: KUDU-3154
>                 URL: https://issues.apache.org/jira/browse/KUDU-3154
>             Project: Kudu
>          Issue Type: Bug
>          Components: ranger, test
>    Affects Versions: 1.13.0
>            Reporter: Alexey Serbin
>            Priority: Major
>         Attachments: ranger_client-test.txt.xz
>
>
> The {{RangerClientTestBase.TestLogging}} scenario of the {{ranger_client-test}} sometimes fails (all types of builds) with error message like below:
> {noformat}
> src/kudu/ranger/ranger_client-test.cc:398: Failure
> Failed                                                                          
> Bad status: Timed out: timed out while in flight                                
> I0620 07:06:02.907177  1140 server.cc:247] Received an EOF from the subprocess  
> I0620 07:06:02.910923  1137 server.cc:317] get failed, inbound queue shut down: Aborted:
> I0620 07:06:02.910964  1141 server.cc:380] outbound queue shut down: Aborted:   
> I0620 07:06:02.910995  1138 server.cc:317] get failed, inbound queue shut down: Aborted:
> I0620 07:06:02.910984  1139 server.cc:317] get failed, inbound queue shut down: Aborted:
> {noformat}
> The log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)