You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vicente Canhoto <vi...@gmail.com> on 2012/03/24 18:14:50 UTC
Older plugin in Nutch 1.4
Hi there,
I'm trying to utilize in Nutch 1.4 a plugin that was made (not by me) for
an older version, possibily 1.2 or 1.3. When i tried to build the plugin it
didn't work, firing exceptions related to Nutch classes that aren't present
in this version (mostly Lucene-related, from what i can tell). I did some
searching and didn't find a way to adapt this plugin to the 1.4 version.
Any help would be much appreciated.
The source code is the following:
package org.apache.nutch.indexer.mp3;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.lucene.document.DateTools;
import org.apache.nutch.metadata.Nutch;
import org.apache.nutch.parse.Parse;
import org.apache.nutch.indexer.IndexingFilter;
import org.apache.nutch.indexer.IndexingException;
import org.apache.nutch.indexer.NutchDocument;
import org.apache.nutch.indexer.lucene.LuceneWriter;
import org.apache.hadoop.io.Text;
import org.apache.nutch.crawl.CrawlDatum;
import org.apache.nutch.crawl.Inlinks;
import java.net.MalformedURLException;
import java.net.URL;
import org.apache.hadoop.conf.Configuration;
import org.apache.nutch.metadata.Metadata;
public class Mp3IndexingFilter implements IndexingFilter {
private static final Log LOG =
LogFactory.getLog(Mp3IndexingFilter.class);
private static final String MP3_TRACK_TITLE = "track_title";
private static final String MP3_ALBUM = "album";
private static final String MP3_ARTIST = "artist";
private static final String MP3_GENRE = "genre";
private static final String MP3_RELEASE_DATE = "releaseDate";
private Configuration conf;
public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
CrawlDatum datum, Inlinks inlinks) throws IndexingException {
// look up email of the author based on the url of the site
//String creatorEmail =
EmailLookup.getCreatorEmail(url.toString());
Metadata metadata = parse.getData().getParseMeta();
mp3Title = metadata.get("title");
mp3Album = metadata.get("xmpDM:album");
mp3Artist = metadata.get("xmpDM:artist");
mp3Genre = metadata.get("xmpDM:genre");
mp3releaseDate = metadata.get("xmpDM:releaseDate");
LOG.info("######## mp3Title = " + mp3Title);
LOG.info("######## mp3Album = " + mp3Album);
LOG.info("######## mp3Artist = " + mp3Artist);
LOG.info("######## mp3Genre = " + mp3Genre);
LOG.info("######## mp3releaseDate = " + mp3releaseDate);
if (mp3Title != null) {
doc.add(MP3_TRACK_TITLE, mp3Title);
}
if (mp3Album != null) {
doc.add(MP3_ALBUM, mp3Album);
}
if (mp3Artist != null) {
doc.add(MP3_ARTIST, mp3Artist);
}
if (mp3Genre != null) {
doc.add(MP3_GENRE, mp3Genre);
}
if (mp3releaseDate != null) {
doc.add(MP3_RELEASE_DATE, mp3releaseDate);
}
return doc;
}
public void addIndexBackendOptions(Configuration conf) {
LuceneWriter.addFieldOptions(MP3_TRACK_TITLE,
LuceneWriter.STORE.YES, LuceneWriter.INDEX.TOKENIZED, conf);
LuceneWriter.addFieldOptions(MP3_ALBUM, LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
LuceneWriter.addFieldOptions(MP3_ARTIST, LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
LuceneWriter.addFieldOptions(MP3_GENRE, LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
LuceneWriter.addFieldOptions(MP3_RELEASE_DATE,
LuceneWriter.STORE.YES, LuceneWriter.INDEX.TOKENIZED, conf);
}
public Configuration getConf() {
return conf;
}
public void setConf(Configuration conf) {
this.conf = conf;
}
}
Thanks in advance,
Vicente Canhoto
Re: Older plugin in Nutch 1.4
Posted by Vicente Canhoto <vi...@gmail.com>.
Thank you very much, i got it working now!
No dia 26 de Março de 2012 15:26, webdev1977 <we...@gmail.com>escreveu:
> I believe it is complaining about this:
>
> *public void addIndexBackendOptions(Configuration conf) {
> LuceneWriter.addFieldOptions(MP3_TRACK_TITLE,
> LuceneWriter.STORE.YES, LuceneWriter.INDEX.TOKENIZED, conf);
> LuceneWriter.addFieldOptions(MP3_ALBUM, LuceneWriter.STORE.YES,
> LuceneWriter.INDEX.TOKENIZED, conf);
> LuceneWriter.addFieldOptions(MP3_ARTIST, LuceneWriter.STORE.YES,
> LuceneWriter.INDEX.TOKENIZED, conf);
> LuceneWriter.addFieldOptions(MP3_GENRE, LuceneWriter.STORE.YES,
> LuceneWriter.INDEX.TOKENIZED, conf);
> LuceneWriter.addFieldOptions(MP3_RELEASE_DATE,
> LuceneWriter.STORE.YES, LuceneWriter.INDEX.TOKENIZED, conf); *
>
> You no longer need to do this in your plugin.
>
> Instead you would replace with something like this:
>
> private static final String MP3_TRACK_TITLE = "track_title";
>
> public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
> CrawlDatum datum, Inlinks inlinks) throws IndexingException {
>
> doc.add(MP3_TRACK_TITLE, "title_for_this_track_goes_here");
>
> }
>
> basically don't use any of the LuceneWriter classes
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Older-plugin-in-Nutch-1-4-tp3854202p3858292.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
Re: Older plugin in Nutch 1.4
Posted by webdev1977 <we...@gmail.com>.
I believe it is complaining about this:
*public void addIndexBackendOptions(Configuration conf) {
LuceneWriter.addFieldOptions(MP3_TRACK_TITLE,
LuceneWriter.STORE.YES, LuceneWriter.INDEX.TOKENIZED, conf);
LuceneWriter.addFieldOptions(MP3_ALBUM, LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
LuceneWriter.addFieldOptions(MP3_ARTIST, LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
LuceneWriter.addFieldOptions(MP3_GENRE, LuceneWriter.STORE.YES,
LuceneWriter.INDEX.TOKENIZED, conf);
LuceneWriter.addFieldOptions(MP3_RELEASE_DATE,
LuceneWriter.STORE.YES, LuceneWriter.INDEX.TOKENIZED, conf); *
You no longer need to do this in your plugin.
Instead you would replace with something like this:
private static final String MP3_TRACK_TITLE = "track_title";
public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
CrawlDatum datum, Inlinks inlinks) throws IndexingException {
doc.add(MP3_TRACK_TITLE, "title_for_this_track_goes_here");
}
basically don't use any of the LuceneWriter classes
--
View this message in context: http://lucene.472066.n3.nabble.com/Older-plugin-in-Nutch-1-4-tp3854202p3858292.html
Sent from the Nutch - User mailing list archive at Nabble.com.