You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dragoș Moldovan-Grünfeld (Jira)" <ji...@apache.org> on 2021/11/25 11:31:00 UTC
[jira] [Updated] (ARROW-13615) [R] Bindings for stringr::str_to_sentence
[ https://issues.apache.org/jira/browse/ARROW-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dragoș Moldovan-Grünfeld updated ARROW-13615:
---------------------------------------------
Parent: ARROW-14865
Issue Type: Sub-task (was: Improvement)
> [R] Bindings for stringr::str_to_sentence
> -----------------------------------------
>
> Key: ARROW-13615
> URL: https://issues.apache.org/jira/browse/ARROW-13615
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: R
> Reporter: Nicola Crane
> Assignee: Dragoș Moldovan-Grünfeld
> Priority: Major
> Labels: good-first-issue, kernel, pull-request-available
> Fix For: 7.0.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> There is more to this issue than meets the eye. The {{stringr::str_to_sentence()}} does 2 things:
> * capitalise the first word
> * if there are multiple sentences provided as a single string, attempts to find sentence breaks and capitalise the first word of each sentence.
> The {{stringr}} implementation wraps {{stringi::str_trans_totitle()}}, which in turns uses ICU’s BreakIterator to locate specific text boundaries. As a consequence {{stringr::str_to_title()}} is not able to identify a full stop / period (".") as a sentence end and does not capitalise words following it. Thus, there is a discrepancy between behaviour of the {{utf8_capitalize}} kernel (which capitalises the first word of a string without making any attempt to break into sentences) and the behaviour of {{stringr::str_to_sentence()}}.
> For more extensive discussions around the {{stringi / stringr}} implementation see {{stringr}} issues [202|https://github.com/tidyverse/stringr/issues/202] and [231|https://github.com/tidyverse/stringr/issues/231].
> Due to the complexity of this issue and the relatively niche use cases, the recommendation is to postpone implementation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)