You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Alexander Franchuk <Al...@icims.com> on 2013/07/18 19:23:53 UTC

classifier.sgd.CsvRecordFactory incorrect CSV parsing

Hi All,
I've been working with mahout for an internship this summer, and in the process I noticed that the CsvRecordFactory class uses incorrect parsing of CSV files. So I made a fix for this, which is in the attached patch file. It's not a huge change or anything, but I thought it would be helpful for people. This will also fix the demo programs in the mahout distribution from failing due to incorrect parsing of CSV files. For instance, if you have a double-quoted field with a comma in it, the demo programs will incorrectly divide the field into two, which in some cases causes parsing problems, and even if the program doesn't fail, it will of course cause incorrect results.

This patch causes the class to use the solr-commons-csv.jar file, which I noticed was included in the mahout distribution.

Hope this helps! And thanks for all your work, my experience with Mahout has been great so far.
Alex Franchuk

RE: classifier.sgd.CsvRecordFactory incorrect CSV parsing

Posted by Alexander Franchuk <Al...@icims.com>.
Hi Sebastian,
I just noticed that. My mistake, I will open a ticket.

Thanks,
Alex

-----Original Message-----
From: Sebastian Schelter [mailto:ssc@apache.org] 
Sent: Thursday, July 18, 2013 1:33 PM
To: dev@mahout.apache.org
Subject: Re: classifier.sgd.CsvRecordFactory incorrect CSV parsing

Hello Alex,

thank you for willing to contribute. Unfortunately you cannot send attachments via this list. Could you open a jira ticket at https://issues.apache.org/jira/browse/MAHOUT and upload your patch there?

-sebastian


2013/7/18 Alexander Franchuk <Al...@icims.com>

>  Hi All,****
>
> I’ve been working with mahout for an internship this summer, and in 
> the process I noticed that the CsvRecordFactory class uses incorrect 
> parsing of CSV files. So I made a fix for this, which is in the attached patch file.
> It’s not a huge change or anything, but I thought it would be helpful 
> for people. This will also fix the demo programs in the mahout 
> distribution from failing due to incorrect parsing of CSV files. For 
> instance, if you have a double-quoted field with a comma in it, the 
> demo programs will incorrectly divide the field into two, which in 
> some cases causes parsing problems, and even if the program doesn’t 
> fail, it will of course cause incorrect results.****
>
> ** **
>
> This patch causes the class to use the solr-commons-csv.jar file, 
> which I noticed was included in the mahout distribution.****
>
> ** **
>
> Hope this helps! And thanks for all your work, my experience with 
> Mahout has been great so far.****
>
> Alex Franchuk****
>

Re: classifier.sgd.CsvRecordFactory incorrect CSV parsing

Posted by Sebastian Schelter <ss...@apache.org>.
Hello Alex,

thank you for willing to contribute. Unfortunately you cannot send
attachments via this list. Could you open a jira ticket at
https://issues.apache.org/jira/browse/MAHOUT and upload your patch there?

-sebastian


2013/7/18 Alexander Franchuk <Al...@icims.com>

>  Hi All,****
>
> I’ve been working with mahout for an internship this summer, and in the
> process I noticed that the CsvRecordFactory class uses incorrect parsing of
> CSV files. So I made a fix for this, which is in the attached patch file.
> It’s not a huge change or anything, but I thought it would be helpful for
> people. This will also fix the demo programs in the mahout distribution
> from failing due to incorrect parsing of CSV files. For instance, if you
> have a double-quoted field with a comma in it, the demo programs will
> incorrectly divide the field into two, which in some cases causes parsing
> problems, and even if the program doesn’t fail, it will of course cause
> incorrect results.****
>
> ** **
>
> This patch causes the class to use the solr-commons-csv.jar file, which I
> noticed was included in the mahout distribution.****
>
> ** **
>
> Hope this helps! And thanks for all your work, my experience with Mahout
> has been great so far.****
>
> Alex Franchuk****
>