You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary Gregory (JIRA)" <ji...@apache.org> on 2014/04/07 16:44:23 UTC

[jira] [Comment Edited] (CSV-110) Add ability to parse single lines

    [ https://issues.apache.org/jira/browse/CSV-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961889#comment-13961889 ] 

Gary Gregory edited comment on CSV-110 at 4/7/14 2:44 PM:
----------------------------------------------------------

Hello Gabriel,

Thank you for the polite discussion :)

I still am not a fan of this add on code, because it feels like an independent add-on. Also, patch has a small bug: it assumes the record separator is \n.

OTOH, your user story can be handled out of the box if we make {{CSVParser.nextRecord()}} public like so (in SVN):

{code:java}
    @Test
    public void testGetOneLineOneParser() throws IOException {
        PipedWriter writer = new PipedWriter();
        PipedReader reader = new PipedReader(writer);
        final CSVFormat format = CSVFormat.DEFAULT;
        final CSVParser parser = new CSVParser(reader, format);
        try {
            writer.append(CSV_INPUT_1);
            writer.append(format.getRecordSeparator());
            final CSVRecord record1 = parser.nextRecord();
            assertArrayEquals(RESULT[0], record1.values());
            writer.append(CSV_INPUT_2);
            writer.append(format.getRecordSeparator());
            final CSVRecord record2 = parser.nextRecord();
            assertArrayEquals(RESULT[1], record2.values());
        } finally {
            parser.close();
        }
    }
{code}

Your patch does not create one parser for each record of course, but it still creates one {{StringReader}} for each record. So we still have some extra objects generated.

There might be other kinds of JRE readers and writers you could use, but this example works nicely and with the only change being {{nextRecord()}} as a public method.

In this example, the piped reader and writer do more than you need (threading) but I could see that having a simpler piped reader and writer would be more generic and useful than the add-on patch.

That's my 2c of course.

I do not want to open the door to a "one line parser" when next someone will want a "two line parser", or whatnot, you get the idea.

This can all be reconsidered post 1.0 of course. It would be nice to get 1.0 out and done ASAP IMO.


was (Author: garydgregory):
Hello Gabriel,

Thank you for the polite discussion :)

I still am not a fan of this add on code, because it feels like an independent add-on. Also, patch has a small bug: it assumes the record separator is \n.

OTOH, your user story can be handled out of the box if we make {{CSVParser.nextRecord()}} public like so (in SVN):

{code:java}
    @Test
    public void testGetOneLineOneParser() throws IOException {
        PipedWriter writer = new PipedWriter();
        PipedReader reader = new PipedReader(writer);
        final CSVFormat format = CSVFormat.DEFAULT;
        final CSVParser parser = new CSVParser(reader, format);
        try {
            writer.append(CSV_INPUT_1);
            writer.append(format.getRecordSeparator());
            final CSVRecord record1 = parser.nextRecord();
            assertArrayEquals(RESULT[0], record1.values());
            writer.append(CSV_INPUT_2);
            writer.append(format.getRecordSeparator());
            final CSVRecord record2 = parser.nextRecord();
            assertArrayEquals(RESULT[1], record2.values());
        } finally {
            parser.close();
        }
    }
{code}

You patch does not create one parser for each record of course, but it still creates one {{StringReader}} for each record. So we still have some extra objects generated.

There might be other kinds of JRE readers and writers you could use, but this example works nicely and with the only change being {{nextRecord()}} as a public method.

In this example, the piped reader and writer do more than you need (threading) but I could see that having a simpler piped reader and writer would be more generic and useful than the add-on patch.

That's my 2c of course.

I do not want to open the door to a "one line parser" when next someone will want a "two line parser", or whatnot, you get the idea.

This can all be reconsidered post 1.0 of course. It would be nice to get 1.0 out and done ASAP IMO.

> Add ability to parse single lines
> ---------------------------------
>
>                 Key: CSV-110
>                 URL: https://issues.apache.org/jira/browse/CSV-110
>             Project: Commons CSV
>          Issue Type: New Feature
>            Reporter: Gabriel Reid
>         Attachments: CSV-110.patch
>
>
> Due to the iterator-based API of CSVParser, there is currently no simple and convenient way to parse single lines of CSV-formatted data. The intention of this ticket is to add something along the lines of the following:
> {code}
> CSVLineParser lineParser = new CSVLineParser(csvFormat);
> String singleLine = "a,b,c";
> CSVRecord singleRecord lineParser.parseLine(singleLine);
> {code}
> The use case of parsing single lines comes up very often in terms of distributed batch processing scenarios (i.e. Hadoop jobs), and CSV-style formats are also regularly used in such scenarios. Currently, projects are often forced to build their own ad-hoc CSV parsing solutions, so adding the ability to parse single lines to commons-csv would be very useful to these projects, as well as anyone doing parsing based on input that isn't necessary in the form of a single stream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)