You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Brian Whitman <br...@variogr.am> on 2006/12/11 14:32:54 UTC
parse-mp3 plugin concatenating previous tags for text field
The parse-mp3 plugin seems to be saving a state of the previous
parse's text content. For every new mp3 file parsed, it is putting
the contents of all the previous text fields in the plain text field
for that file.
You can see this by fetching a set of mp3s in one segment, then
viewing their plain text in the nutch webapp. The plaintext will
include the contents of all files fetched in that round, which makes
searching fruitless.
I made a tiny band-aid change to MP3Parser.java and
MetadataCollector.java against the nightly. It seems to fix the problem.
--- MP3Parser.java 2006-12-10 09:43:26.000000000 -0500
+++ MP3Parser.java.new 2006-12-10 16:37:03.000000000 -0500
@@ -67,7 +67,7 @@
fos.write(raw);
fos.close();
MP3File mp3 = new MP3File(tmp);
-
+ metadataCollector.clearText();
if (mp3.hasID3v2Tag()) {
parse = getID3v2Parse(mp3, content.getMetadata());
} else if (mp3.hasID3v1Tag()) {
--- MetadataCollector.java 2006-12-10 09:43:26.000000000 -0500
+++ MetadataCollector.java.new 2006-12-10 16:37:28.000000000 -0500
@@ -42,6 +42,10 @@
this.conf = conf;
}
+ public void clearText() {
+ text = "";
+ }
+
public void notifyProperty(String name, String value) throws
MalformedURLException {
if (name.equals("TIT2-Text"))
setTitle(value);
Re: parse-mp3 plugin concatenating previous tags for text field
Posted by Sami Siren <ss...@gmail.com>.
Could you please create a JIRA issue and attach this patch there so it
won't get lost. It also helps to keep uptodate the CHANGES file as you
can just copy-paste from there when you do a commit.
--
Sami Siren
Brian Whitman wrote:
> The parse-mp3 plugin seems to be saving a state of the previous parse's
> text content. For every new mp3 file parsed, it is putting the contents
> of all the previous text fields in the plain text field for that file.
>
> You can see this by fetching a set of mp3s in one segment, then viewing
> their plain text in the nutch webapp. The plaintext will include the
> contents of all files fetched in that round, which makes searching
> fruitless.
>
> I made a tiny band-aid change to MP3Parser.java and
> MetadataCollector.java against the nightly. It seems to fix the problem.
>
>
> --- MP3Parser.java 2006-12-10 09:43:26.000000000 -0500
> +++ MP3Parser.java.new 2006-12-10 16:37:03.000000000 -0500
> @@ -67,7 +67,7 @@
> fos.write(raw);
> fos.close();
> MP3File mp3 = new MP3File(tmp);
> -
> + metadataCollector.clearText();
> if (mp3.hasID3v2Tag()) {
> parse = getID3v2Parse(mp3, content.getMetadata());
> } else if (mp3.hasID3v1Tag()) {
>
> --- MetadataCollector.java 2006-12-10 09:43:26.000000000 -0500
> +++ MetadataCollector.java.new 2006-12-10 16:37:28.000000000 -0500
> @@ -42,6 +42,10 @@
> this.conf = conf;
> }
>
> + public void clearText() {
> + text = "";
> + }
> +
> public void notifyProperty(String name, String value) throws
> MalformedURLException {
> if (name.equals("TIT2-Text"))
> setTitle(value);
>
>
>
>
>
>
>