You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2006/01/08 00:08:20 UTC
[jira] Created: (NUTCH-166) secure jobtracker info pages with a
password
secure jobtracker info pages with a password
--------------------------------------------
Key: NUTCH-166
URL: http://issues.apache.org/jira/browse/NUTCH-166
Project: Nutch
Type: Improvement
Versions: 0.8-dev
Reporter: Stefan Groschupf
Fix For: 0.8-dev
Since people often post stack-traces in the mailing list that contains ip addresses it is easy for others to view the info pages of the jobtracker.
This may contains more security critical informations like more ip addresses and internal host-names etc.
Therefore this patch adds a Basic password authentication to the jetty server.
The user name is 'admin' and the password can be configured in the nutch configuration file.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
Re: NPE in Indexer.java line 184
Posted by Gal Nitzan <gn...@usa.net>.
OK. thanks for the patch.
I shall embed it tonight.
I promise :) to let you know...
Gal.
On Mon, 2006-01-09 at 10:53 +0100, Andrzej Bialecki wrote:
> Gal Nitzan wrote:
>
> >Sorry :) no.
> >
> >
> >
>
> Hmm. ok. :) But I think that patch is needed anyway, because now we
> silently assume that parse plugins will always copy all Content metadata
> to ParseData.metadata, while it may not be the case - and it certainly
> does not happen if there is a parse error ... and this patch fixes it.
> Later on, Indexer tries to retrieve these values from
> parseData.metadata, and not from the content.metadata (because we try to
> avoid reading too much data, so the content part of a segment is not
> accessed during indexing).
>
> >I run fetcher with parse.
> >
> >This NPE happens for only a few documents and that is the problem :)
> >
> >
>
> Ok, then I think I know what is going on... Please try this patch -
> that's the same problem, actually: these few documents failed to parse,
> and we got an empty parseData - but in this case it means also empty
> metadata, which means no segment name nor score in parseData.metadata.
>
> Please test and report if it helps.
>
> plain text document attachment (patch)
> Index: Fetcher.java
> ===================================================================
> --- Fetcher.java (revision 367099)
> +++ Fetcher.java (working copy)
> @@ -223,6 +223,9 @@
> parse.getData().getMetadata().setProperty(SIGNATURE_KEY, StringUtil.toHexString(signature));
> datum.setSignature(signature);
> }
> + // add segment name and score to parseData metadata
> + parse.getData().getMetadata().setProperty(SEGMENT_NAME_KEY, segmentName);
> + parse.getData().getMetadata().setProperty(SCORE_KEY, Float.toString(datum.getScore()));
>
> try {
> output.collect
Re: What/how num of required maps is set? OOP Wrong list
Posted by Gal Nitzan <gn...@usa.net>.
On Mon, 2006-01-09 at 12:07 +0200, Gal Nitzan wrote:
> I am trying to figure out how the required map is set/calculated by
> Nutch.
>
> I have 3 task trackers.
>
> I added one more.
>
> When I run fetch only the initial three are fetching.
>
> I have added the task tracker before calling generate (if it has any
> meanning)
>
> Thanks,
>
> G.
>
>
>
>
What/how num of required maps is set?
Posted by Gal Nitzan <gn...@usa.net>.
I am trying to figure out how the required map is set/calculated by
Nutch.
I have 3 task trackers.
I added one more.
When I run fetch only the initial three are fetching.
I have added the task tracker before calling generate (if it has any
meanning)
Thanks,
G.
Re: NPE in Indexer.java line 184
Posted by Andrzej Bialecki <ab...@getopt.org>.
Gal Nitzan wrote:
>Sorry :) no.
>
>
>
Hmm. ok. :) But I think that patch is needed anyway, because now we
silently assume that parse plugins will always copy all Content metadata
to ParseData.metadata, while it may not be the case - and it certainly
does not happen if there is a parse error ... and this patch fixes it.
Later on, Indexer tries to retrieve these values from
parseData.metadata, and not from the content.metadata (because we try to
avoid reading too much data, so the content part of a segment is not
accessed during indexing).
>I run fetcher with parse.
>
>This NPE happens for only a few documents and that is the problem :)
>
>
Ok, then I think I know what is going on... Please try this patch -
that's the same problem, actually: these few documents failed to parse,
and we got an empty parseData - but in this case it means also empty
metadata, which means no segment name nor score in parseData.metadata.
Please test and report if it helps.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: NPE in Indexer.java line 184
Posted by Gal Nitzan <gn...@usa.net>.
Sorry :) no.
I run fetcher with parse.
This NPE happens for only a few documents and that is the problem :)
On Mon, 2006-01-09 at 09:43 +0100, Andrzej Bialecki wrote:
> Gal Nitzan wrote:
>
> >Hi Andrzej,
> >
> >The value cannot be null is my message :)
> >
> >
> >
>
> :)
>
> I'm guessing that you are using Fetcher in non-parsing mode, and then
> you run ParseSegment as a separate step, right?
>
> Please try the attached patch.
>
> plain text document attachment (patch)
> Index: ParseSegment.java
> ===================================================================
> --- ParseSegment.java (revision 367099)
> +++ ParseSegment.java (working copy)
> @@ -58,9 +58,16 @@
> status = new ParseStatus(e);
> }
>
> + ContentProperties metadata = parse.getData().getMetadata();
> // compute the new signature
> byte[] signature = SignatureFactory.getSignature(getConf()).calculate(content, parse);
> - parse.getData().getMetadata().setProperty(Fetcher.SIGNATURE_KEY, StringUtil.toHexString(signature));
> + metadata.setProperty(Fetcher.SIGNATURE_KEY, StringUtil.toHexString(signature));
> + // copy segment name and score
> + String segmentName = content.getMetadata().getProperty(Fetcher.SEGMENT_NAME_KEY);
> + String score = content.getMetadata().getProperty(Fetcher.SCORE_KEY);
> + metadata.setProperty(Fetcher.SEGMENT_NAME_KEY, segmentName);
> + metadata.setProperty(Fetcher.SCORE_KEY, score);
> +
> if (status.isSuccess()) {
> output.collect(key, new ParseImpl(parse.getText(), parse.getData()));
> } else {
Re: NPE in Indexer.java line 184
Posted by Andrzej Bialecki <ab...@getopt.org>.
Gal Nitzan wrote:
>Hi Andrzej,
>
>The value cannot be null is my message :)
>
>
>
:)
I'm guessing that you are using Fetcher in non-parsing mode, and then
you run ParseSegment as a separate step, right?
Please try the attached patch.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: NPE in Indexer.java line 184
Posted by Gal Nitzan <gn...@usa.net>.
Hi Andrzej,
The value cannot be null is my message :)
060109 094543 task_r_9xvvcz Could not get property: segment name
060109 094543 task_r_9xvvcz [Ljava.lang.StackTraceElement;@154864a
060109 094543 task_r_9xvvcz java.lang.NullPointerException: value cannot
be null
060109 094543 task_r_9xvvcz at
org.apache.lucene.document.Field.<init>(Field.java:469)
060109 094543 task_r_9xvvcz at
org.apache.lucene.document.Field.<init>(Field.java:412)
060109 094543 task_r_9xvvcz at
org.apache.lucene.document.Field.UnIndexed(Field.java:195)
060109 094543 task_r_9xvvcz at
org.apache.nutch.indexer.Indexer.reduce(Indexer.java:200)
060109 094543 task_r_9xvvcz at
org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
060109 094543 task_r_9xvvcz at org.apache.nutch.mapred.TaskTracker
$Child.main(TaskTracker.java:603)
Gal
On Sun, 2006-01-08 at 10:07 +0100, Andrzej Bialecki wrote:
> Gal Nitzan wrote:
>
> >Hi
> >
> >While the reduce task is running I sometime get this exception and it
> >breaks the whole job.
> >
> >As a work around I put this line in a try catch and just return however
> >I was not sure why the meta can not find the segment key name.
> >
> >This work around is good for now.
> >
> >
> >
>
> Stacktrace?
>
Re: NPE in Indexer.java line 184
Posted by Andrzej Bialecki <ab...@getopt.org>.
Gal Nitzan wrote:
>Hi
>
>While the reduce task is running I sometime get this exception and it
>breaks the whole job.
>
>As a work around I put this line in a try catch and just return however
>I was not sure why the meta can not find the segment key name.
>
>This work around is good for now.
>
>
>
Stacktrace?
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
NPE in Indexer.java line 184
Posted by Gal Nitzan <gn...@usa.net>.
Hi
While the reduce task is running I sometime get this exception and it
breaks the whole job.
As a work around I put this line in a try catch and just return however
I was not sure why the meta can not find the segment key name.
This work around is good for now.
G.
[jira] Updated: (NUTCH-166) secure jobtracker info pages with a
password
Posted by "Stefan Groschupf (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-166?page=all ]
Stefan Groschupf updated NUTCH-166:
-----------------------------------
Attachment: passwordPatch.txt
> secure jobtracker info pages with a password
> --------------------------------------------
>
> Key: NUTCH-166
> URL: http://issues.apache.org/jira/browse/NUTCH-166
> Project: Nutch
> Type: Improvement
> Versions: 0.8-dev
> Reporter: Stefan Groschupf
> Fix For: 0.8-dev
> Attachments: passwordPatch.txt
>
> Since people often post stack-traces in the mailing list that contains ip addresses it is easy for others to view the info pages of the jobtracker.
> This may contains more security critical informations like more ip addresses and internal host-names etc.
> Therefore this patch adds a Basic password authentication to the jetty server.
> The user name is 'admin' and the password can be configured in the nutch configuration file.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Resolved: (NUTCH-166) secure jobtracker info pages with a
password
Posted by "Sami Siren (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-166?page=all ]
Sami Siren resolved NUTCH-166:
------------------------------
Resolution: Won't Fix
this is hadoop related
> secure jobtracker info pages with a password
> --------------------------------------------
>
> Key: NUTCH-166
> URL: http://issues.apache.org/jira/browse/NUTCH-166
> Project: Nutch
> Type: Improvement
> Versions: 0.8-dev
> Reporter: Stefan Groschupf
> Fix For: 0.8-dev
> Attachments: passwordPatch.txt
>
> Since people often post stack-traces in the mailing list that contains ip addresses it is easy for others to view the info pages of the jobtracker.
> This may contains more security critical informations like more ip addresses and internal host-names etc.
> Therefore this patch adds a Basic password authentication to the jetty server.
> The user name is 'admin' and the password can be configured in the nutch configuration file.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira