You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Simon Spero (JIRA)" <ji...@apache.org> on 2018/06/07 16:25:00 UTC
[jira] [Commented] (IO-577) Add readers to filter out given
characters: CharacterSetFilterReader and CharacterFilterReader.
[ https://issues.apache.org/jira/browse/IO-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504882#comment-16504882 ]
Simon Spero commented on IO-577:
--------------------------------
A few comments :
1. apis like java.lang.stream and rxjava have filter methods that work in the opposite sense to the filter method introduced here - they select items that match the test, rather than excluding them.
2. The documentation refers to "codepoints"; however, the read method in java.io.FilterReader returns UTF-16 characters. This makes a difference for characters that aren't in the BMP, and which are represented in Java as surrogate pairs. The current implementation can't filter codepoints like 😠(U+1F62D) because it only sees the UTF-16 surrogates.
Working with codepoints would potentially require interposing a pushback reader to handle the case where the input contains a codepoint encoded in more than one char, which is not rejected.
3. commons IO is currently using Java 7. If the source level were to change to Java 8 then the filter method could be replaced by an IntPredicate / Predicate<Integer> (passed in when the class is constructed). The current cases could be handled using a method reference. / Predicate.isEquals.
> Add readers to filter out given characters: CharacterSetFilterReader and CharacterFilterReader.
> -----------------------------------------------------------------------------------------------
>
> Key: IO-577
> URL: https://issues.apache.org/jira/browse/IO-577
> Project: Commons IO
> Issue Type: New Feature
> Components: Filters
> Reporter: Gary Gregory
> Assignee: Gary Gregory
> Priority: Major
> Fix For: 2.7
>
> Attachments: commons-io-577.patch
>
>
> Add readers to filter out given characters, Â handy to remove known junk characters from CSV files for example. Please see attached.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)