You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vicente Canhoto <vi...@gmail.com> on 2012/03/24 18:14:50 UTC

Older plugin in Nutch 1.4

Hi there,

I'm trying to utilize in Nutch 1.4 a plugin that was made (not by me) for
an older version, possibily 1.2 or 1.3. When i tried to build the plugin it
didn't work, firing exceptions related to Nutch classes that aren't present
in this version (mostly Lucene-related, from what i can tell). I did some
searching and didn't find a way to adapt this plugin to the 1.4 version.
Any help would be much appreciated.

The source code is the following:

package org.apache.nutch.indexer.mp3;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.lucene.document.DateTools;
import org.apache.nutch.metadata.Nutch;
import org.apache.nutch.parse.Parse;
import org.apache.nutch.indexer.IndexingFilter;

import org.apache.nutch.indexer.IndexingException;
import org.apache.nutch.indexer.NutchDocument;
import org.apache.nutch.indexer.lucene.LuceneWriter;
import org.apache.hadoop.io.Text;
import org.apache.nutch.crawl.CrawlDatum;
import org.apache.nutch.crawl.Inlinks;
import java.net.MalformedURLException;
import java.net.URL;
import org.apache.hadoop.conf.Configuration;
import org.apache.nutch.metadata.Metadata;

public class Mp3IndexingFilter implements IndexingFilter {
    private static final Log LOG =
LogFactory.getLog(Mp3IndexingFilter.class);
    private static final String MP3_TRACK_TITLE = "track_title";
    private static final String MP3_ALBUM = "album";
    private static final String MP3_ARTIST = "artist";
    private static final String MP3_GENRE = "genre";
    private static final String MP3_RELEASE_DATE = "releaseDate";
    private Configuration conf;

    public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
    CrawlDatum datum, Inlinks inlinks) throws IndexingException {
        // look up email of the author based on the url of the site
        //String creatorEmail =
        EmailLookup.getCreatorEmail(url.toString());
        Metadata metadata = parse.getData().getParseMeta();

        mp3Title = metadata.get("title");
        mp3Album = metadata.get("xmpDM:album");
        mp3Artist = metadata.get("xmpDM:artist");
        mp3Genre = metadata.get("xmpDM:genre");
        mp3releaseDate = metadata.get("xmpDM:releaseDate");
        LOG.info("######## mp3Title = " + mp3Title);
        LOG.info("######## mp3Album = " + mp3Album);
        LOG.info("######## mp3Artist = " + mp3Artist);
        LOG.info("######## mp3Genre = " + mp3Genre);
        LOG.info("######## mp3releaseDate = " + mp3releaseDate);

        if (mp3Title != null) {
            doc.add(MP3_TRACK_TITLE, mp3Title);
        }
        if (mp3Album != null) {
            doc.add(MP3_ALBUM, mp3Album);
        }
        if (mp3Artist != null) {
            doc.add(MP3_ARTIST, mp3Artist);
        }
        if (mp3Genre != null) {
            doc.add(MP3_GENRE, mp3Genre);
        }
        if (mp3releaseDate != null) {
            doc.add(MP3_RELEASE_DATE, mp3releaseDate);
        }
        return doc;
    }

    public void addIndexBackendOptions(Configuration conf) {
        LuceneWriter.addFieldOptions(MP3_TRACK_TITLE,
LuceneWriter.STORE.YES, LuceneWriter.INDEX.TOKENIZED, conf);
        LuceneWriter.addFieldOptions(MP3_ALBUM, LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
        LuceneWriter.addFieldOptions(MP3_ARTIST, LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
        LuceneWriter.addFieldOptions(MP3_GENRE, LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
        LuceneWriter.addFieldOptions(MP3_RELEASE_DATE,
LuceneWriter.STORE.YES, LuceneWriter.INDEX.TOKENIZED, conf);
    }

    public Configuration getConf() {
        return conf;
    }
    public void setConf(Configuration conf) {
        this.conf = conf;
    }
}


Thanks in advance,
Vicente Canhoto

Re: Older plugin in Nutch 1.4

Posted by Vicente Canhoto <vi...@gmail.com>.
Thank you very much, i got it working now!

No dia 26 de Março de 2012 15:26, webdev1977 <we...@gmail.com>escreveu:

> I believe it is complaining about this:
>
> *public void addIndexBackendOptions(Configuration conf) {
>         LuceneWriter.addFieldOptions(MP3_TRACK_TITLE,
> LuceneWriter.STORE.YES, LuceneWriter.INDEX.TOKENIZED, conf);
>        LuceneWriter.addFieldOptions(MP3_ALBUM, LuceneWriter.STORE.YES,
> LuceneWriter.INDEX.TOKENIZED, conf);
>        LuceneWriter.addFieldOptions(MP3_ARTIST, LuceneWriter.STORE.YES,
> LuceneWriter.INDEX.TOKENIZED, conf);
>        LuceneWriter.addFieldOptions(MP3_GENRE, LuceneWriter.STORE.YES,
> LuceneWriter.INDEX.TOKENIZED, conf);
>        LuceneWriter.addFieldOptions(MP3_RELEASE_DATE,
> LuceneWriter.STORE.YES, LuceneWriter.INDEX.TOKENIZED, conf); *
>
> You no longer need to do this in your plugin.
>
> Instead you would replace with something like this:
>
> private static final String MP3_TRACK_TITLE = "track_title";
>
> public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
>    CrawlDatum datum, Inlinks inlinks) throws IndexingException {
>
>   doc.add(MP3_TRACK_TITLE, "title_for_this_track_goes_here");
>
> }
>
> basically don't use any of the LuceneWriter classes
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Older-plugin-in-Nutch-1-4-tp3854202p3858292.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

Re: Older plugin in Nutch 1.4

Posted by webdev1977 <we...@gmail.com>.
I believe it is complaining about this:

*public void addIndexBackendOptions(Configuration conf) {
        LuceneWriter.addFieldOptions(MP3_TRACK_TITLE,
LuceneWriter.STORE.YES, LuceneWriter.INDEX.TOKENIZED, conf);
        LuceneWriter.addFieldOptions(MP3_ALBUM, LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
        LuceneWriter.addFieldOptions(MP3_ARTIST, LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
        LuceneWriter.addFieldOptions(MP3_GENRE, LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
        LuceneWriter.addFieldOptions(MP3_RELEASE_DATE,
LuceneWriter.STORE.YES, LuceneWriter.INDEX.TOKENIZED, conf); *

You no longer need to do this in your plugin.

Instead you would replace with something like this:

private static final String MP3_TRACK_TITLE = "track_title"; 

public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
    CrawlDatum datum, Inlinks inlinks) throws IndexingException { 

  doc.add(MP3_TRACK_TITLE, "title_for_this_track_goes_here");

}

basically don't use any of the LuceneWriter classes



--
View this message in context: http://lucene.472066.n3.nabble.com/Older-plugin-in-Nutch-1-4-tp3854202p3858292.html
Sent from the Nutch - User mailing list archive at Nabble.com.