You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2019/03/04 20:28:00 UTC

[jira] [Commented] (KUDU-2721) CHECK can be hit when there are gaps in present CPU numbers

    [ https://issues.apache.org/jira/browse/KUDU-2721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783752#comment-16783752 ] 

Tim Armstrong commented on KUDU-2721:
-------------------------------------

The options that I see for addressing this are:
* Generalise the parsing logic to handle all potential strings that can be emitted from /present
* Switch away from manually parsing and use something like get_nprocs_conf() instead.

It sounds like get_nprocs_conf() may have some issues too: https://lore.kernel.org/patchwork/patch/467415/. I guess we saw some issues with it in Impala like https://issues.apache.org/jira/browse/IMPALA-6500. So probably that is not a good idea.

I'm actually not convinced that parsing /present is the right thing either, it seems like we should be parsing /possible to account for CPUs that could be hotplugged. The format is documented here: https://www.kernel.org/doc/Documentation/cputopology.txt. All the examples in the file look like comma-separated lists of ranges, e.g. "2,4-31,32-63". 

It's documented as being compatible with bitmap_parselist(), which actually supports an additional postfix notation. I *believe* this feature is not used in the output of the CPU lists.

> CHECK can be hit when there are gaps in present CPU numbers
> -----------------------------------------------------------
>
>                 Key: KUDU-2721
>                 URL: https://issues.apache.org/jira/browse/KUDU-2721
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>         Environment: SLES12-SP3
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: crash
>         Attachments: cpus.cc
>
>
> We saw a case where Impala is crashing in the Kudu client and it seems to be because the "present" string can have multiple ranges in it - "0-15,32-47\n" in this case.  See https://github.com/apache/kudu/blob/148a0c7bec6554724339a2235cbd723fb74be339/src/kudu/gutil/sysinfo.cc#L177
> I've attached a small test program based on the code illustrating that the assert gets hit.
> I think we should figure out all of the possible formats for the present string and make sure we handle them. Or figure out a different way to get equivalent info.
> A workaround for the case that we saw was to disable hyperthreading, which changed the string to being a single range.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)