You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/17 12:37:42 UTC

[jira] [Commented] (NUTCH-1541) Indexer plugin to write CSV

    [ https://issues.apache.org/jira/browse/NUTCH-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797768#comment-13797768 ] 

Julien Nioche commented on NUTCH-1541:
--------------------------------------

Hi 

line 342 needs to be 
{code}
while (nextQuoteChar > 0 && nextQuoteChar < max) {
{code}

Am I right in thinking that it generates the output on the local file system only? When it is used in deployed mode, won't it create one local file per reducer? If so we should make this very explicit in a README file.

Just thinking aloud here but what's preventing us from relying on the standard TextOutputFormat and put things on HDFS if the configuration says so? Is it because the IndexingJob sets a dummy FileOutputPath and the IndexWriters know nothing about it?

Maybe it would be good to have some abstract class for text-based index writers to facilitate writing new ones, e.g. XML, JSON etc...? 





> Indexer plugin to write CSV
> ---------------------------
>
>                 Key: NUTCH-1541
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1541
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 1.7
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.8
>
>         Attachments: NUTCH-1541-v1.patch, NUTCH-1541-v2.patch
>
>
> With the new pluggable indexer a simple plugin would be handy to write configurable fields into a CSV file - for further analysis or just for export.



--
This message was sent by Atlassian JIRA
(v6.1#6144)