You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nicola Crane (Jira)" <ji...@apache.org> on 2021/10/21 12:40:00 UTC

[jira] [Closed] (ARROW-13615) [R] Bindings for stringr::str_to_sentence

     [ https://issues.apache.org/jira/browse/ARROW-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicola Crane closed ARROW-13615.
--------------------------------
    Resolution: Won't Fix

> [R] Bindings for stringr::str_to_sentence
> -----------------------------------------
>
>                 Key: ARROW-13615
>                 URL: https://issues.apache.org/jira/browse/ARROW-13615
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Nicola Crane
>            Assignee: Dragoș Moldovan-Grünfeld
>            Priority: Major
>              Labels: good-first-issue, kernel, pull-request-available
>             Fix For: 7.0.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There is more to this issue than meets the eye. The {{stringr::str_to_sentence()}} does 2 things:
>  * capitalise the first word 
>  * if there are multiple sentences provided as a single string, attempts to find sentence breaks and capitalise the first word of each sentence.
> The {{stringr}} implementation wraps {{stringi::str_trans_totitle()}}, which in turns uses ICU’s BreakIterator to locate specific text boundaries. As a consequence {{stringr::str_to_title()}} is not able to identify a full stop / period (".") as a sentence end and does not capitalise words following it. Thus, there is a discrepancy between behaviour of the {{utf8_capitalize}} kernel (which capitalises the first word of a string without making any attempt to break into sentences) and the behaviour of {{stringr::str_to_sentence()}}.
> For more extensive discussions around the {{stringi / stringr}} implementation see {{stringr}} issues [202|https://github.com/tidyverse/stringr/issues/202] and [231|https://github.com/tidyverse/stringr/issues/231].
> Due to the complexity of this issue and the relatively niche use cases, the recommendation is to postpone implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)