You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by eyal edri <ey...@gmail.com> on 2007/10/10 12:59:47 UTC
download code works in fetch class but not in plugins class
Hi,
I've written a piece of code for d/l files from urls (written below).
The code works great when injecting it to Fetcher.java src class, while
capturing the desired d/l content types.
i want to move the code to the plugins src class so that it will
automatically d/l the files via the plugin (such as zip, for exe i will
need to write a plugin).
but when i write it for e.g. in ZipParser.java (right at the begining of the
"getParse(Content content)" function), it doesnt do anything, any idea?
the code:
public ParseResult getParse(final Content content) {
String resultText = null;
String resultTitle = null;
Outlink[] outlinks = null;
List outLinksList = new ArrayList();
Properties properties = null;
try {
// my code:
LOG.info ("edri:: found file type:" + content.getContentType());
Pattern regex = Pattern.compile ("http://([^/]*).*/([^/]*)$");
Matcher urlMatcher = regex.matcher(content.getUrl());
String domain = null;
String fileLast = null;
// group is equvillant to $1 $2 in regex
while ( urlMatcher.find() ) {
domain = urlMatcher.group(1);
fileLast = urlMatcher.group(2);
}
LOG.info ("edri:: filename " + fileLast);
LOG.info ("edri:: domain " + domain);
File downloadDir = new File("/home/eyale/nutch/DOWNLOADS/" + domain);
if ( !downloadDir.exists() )
downloadDir.mkdir();
String filename = downloadDir + "/" + fileLast;
LOG.info ("edri:: saving filename: " + filename);
byte [] contentBArray = content.getContent();
FileOutputStream out = new FileOutputStream (new File
(filename));
for (int i=0; i < contentBArray.length; i++)
{
out.write(contentBArray[i]);
}
out.close();
... the rest of the function here....
thanks,
Eyal.
--
Eyal Edri