You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Danny Morgan (JIRA)" <ji...@apache.org> on 2014/12/01 21:57:12 UTC

[jira] [Commented] (CRUNCH-429) The CSVFileSource does not always function properly

    [ https://issues.apache.org/jira/browse/CRUNCH-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230442#comment-14230442 ] 

Danny Morgan commented on CRUNCH-429:
-------------------------------------

[~mkwhitacre] Moving FileSystem retrieval outside the loop broke reading from alternative filesystem sources.

If the crunch job is running on a hadoop cluster and the input paths are s3 then:

{code:java}
FileSystem fileSystem = FileSystem.get(job.getConfiguration());
{code}

isn't correct.

> The CSVFileSource does not always function properly
> ---------------------------------------------------
>
>                 Key: CRUNCH-429
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-429
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.3
>            Reporter: mac champion
>            Assignee: mac champion
>            Priority: Minor
>              Labels: csv, csvparser
>             Fix For: 0.8.4, 0.11.0
>
>         Attachments: 0001-CRUNCH-429-Fix-CSVInputFormat.patch, CRUNCH-429_a.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The "configure" method of CSVInputFormat does not have any effect on its configuration and is never called. Instead, the class needs to implement Configurable and set its configuration options in an overriden setConf method.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)