You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Simon Spero (JIRA)" <ji...@apache.org> on 2018/06/07 16:25:00 UTC

[jira] [Commented] (IO-577) Add readers to filter out given characters: CharacterSetFilterReader and CharacterFilterReader.

    [ https://issues.apache.org/jira/browse/IO-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504882#comment-16504882 ] 

Simon Spero commented on IO-577:
--------------------------------

A few comments :
1. apis like java.lang.stream and rxjava have filter methods that work in the opposite sense to the filter method introduced here - they select items that match the test, rather than excluding them. 

2. The documentation refers to "codepoints"; however, the read method in java.io.FilterReader returns UTF-16 characters. This makes a difference for characters that aren't in the BMP, and which are represented in Java as surrogate pairs. The current implementation can't filter codepoints like 😭 (U+1F62D) because it only sees the UTF-16 surrogates.  
Working with codepoints would potentially require interposing a pushback reader to handle the case where the input contains a codepoint encoded in more than one char, which is not rejected. 

3. commons IO is currently using Java 7. If the source level were to change to Java 8 then the filter method could be replaced by an IntPredicate  / Predicate<Integer> (passed in when the class is constructed).  The current cases could be handled using a method reference. / Predicate.isEquals. 


> Add readers to filter out given characters: CharacterSetFilterReader and CharacterFilterReader.
> -----------------------------------------------------------------------------------------------
>
>                 Key: IO-577
>                 URL: https://issues.apache.org/jira/browse/IO-577
>             Project: Commons IO
>          Issue Type: New Feature
>          Components: Filters
>            Reporter: Gary Gregory
>            Assignee: Gary Gregory
>            Priority: Major
>             Fix For: 2.7
>
>         Attachments: commons-io-577.patch
>
>
> Add readers to filter out given characters,  handy to remove known junk characters from CSV files for example. Please see attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)