You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/04/14 12:49:54 UTC

Precopy http.agent properties to nutch-site

Hi guys,

Maybe a last convenience would be to precopy the mandatory http.agent 
properties to nutch-site. This would, in my opinion, encourage users not to 
set the properties in nutch-default but where it should, in nutch-site. 
Thoughts?

Cheers,
-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Precopy http.agent properties to nutch-site

Posted by Markus Jelsma <ma...@openindex.io>.
Yes, it makes sense to provide a working set up. But since http.agent.* 
properties are dependant on the user, what values would be sensible? At least 
not a value that would indicate that nutch.apache.org operates the crawler.

On Tuesday 26 April 2011 19:24:33 Susam Pal wrote
> 
> I would suggest that these properties are set to sensible values in
> 'conf/nutch-default.xml' itself. I have found it inconvenient to
> override these properties every time I have installed Nutch. IMHO it
> would be good to have a working configuration available with the
> source code and distribution.
> 
> Regards,
> Susam Pal

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Precopy http.agent properties to nutch-site

Posted by Susam Pal <su...@gmail.com>.
On Tue, Apr 26, 2011 at 10:38 PM, Markus Jelsma
<ma...@openindex.io> wrote:
> Hi,
>
> Of course, but since the agent.* params are mandatory (fetcher will abort when
> not specified) we could then add to the error message that these params (like
> all) must be set in nutch-site. New users would then keep using nutch-site, at
> least that's the idea ;)
>
> I think that if nutch-default is set to read-only, users that try to modify
> will indeed immediately change to write permission and continue to use the
> wrong config.
>
> Cheers,
>
>> Hi Markus
>>
>> Any param overridden by the users should be in nutch-site.xml, not just
>> http.agent, so why make an exception for it? Moreover that will not
>> necessarily prevent people from using nutch-default.xml
>>
>> Maybe we could set nutch-default to readonly? Could be changed by the user
>> but this might nudge them in the right direction
>>
>> Julien
>>
>> On 26 April 2011 16:55, Markus Jelsma <ma...@openindex.io> wrote:
>> > Bump. Thoughts?
>> >
>> > On Thursday 14 April 2011 12:49:54 Markus Jelsma wrote:
>> > > Hi guys,
>> > >
>> > > Maybe a last convenience would be to precopy the mandatory http.agent
>> > > properties to nutch-site. This would, in my opinion, encourage users
>> > > not
>> >
>> > to
>> >
>> > > set the properties in nutch-default but where it should, in nutch-site.
>> > > Thoughts?
>> > >
>> > > Cheers,
>> >
>> > --
>> > Markus Jelsma - CTO - Openindex
>> > http://www.linkedin.com/in/markus17
>> > 050-8536620 / 06-50258350
>

I would suggest that these properties are set to sensible values in
'conf/nutch-default.xml' itself. I have found it inconvenient to
override these properties every time I have installed Nutch. IMHO it
would be good to have a working configuration available with the
source code and distribution.

Regards,
Susam Pal

Re: Precopy http.agent properties to nutch-site

Posted by Markus Jelsma <ma...@openindex.io>.
Hi,

Of course, but since the agent.* params are mandatory (fetcher will abort when 
not specified) we could then add to the error message that these params (like 
all) must be set in nutch-site. New users would then keep using nutch-site, at 
least that's the idea ;)

I think that if nutch-default is set to read-only, users that try to modify 
will indeed immediately change to write permission and continue to use the 
wrong config.

Cheers,

> Hi Markus
> 
> Any param overridden by the users should be in nutch-site.xml, not just
> http.agent, so why make an exception for it? Moreover that will not
> necessarily prevent people from using nutch-default.xml
> 
> Maybe we could set nutch-default to readonly? Could be changed by the user
> but this might nudge them in the right direction
> 
> Julien
> 
> On 26 April 2011 16:55, Markus Jelsma <ma...@openindex.io> wrote:
> > Bump. Thoughts?
> > 
> > On Thursday 14 April 2011 12:49:54 Markus Jelsma wrote:
> > > Hi guys,
> > > 
> > > Maybe a last convenience would be to precopy the mandatory http.agent
> > > properties to nutch-site. This would, in my opinion, encourage users
> > > not
> > 
> > to
> > 
> > > set the properties in nutch-default but where it should, in nutch-site.
> > > Thoughts?
> > > 
> > > Cheers,
> > 
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

Re: Precopy http.agent properties to nutch-site

Posted by Julien Nioche <li...@gmail.com>.
Hi Markus

Any param overridden by the users should be in nutch-site.xml, not just
http.agent, so why make an exception for it? Moreover that will not
necessarily prevent people from using nutch-default.xml

Maybe we could set nutch-default to readonly? Could be changed by the user
but this might nudge them in the right direction

Julien


On 26 April 2011 16:55, Markus Jelsma <ma...@openindex.io> wrote:

> Bump. Thoughts?
>
> On Thursday 14 April 2011 12:49:54 Markus Jelsma wrote:
> > Hi guys,
> >
> > Maybe a last convenience would be to precopy the mandatory http.agent
> > properties to nutch-site. This would, in my opinion, encourage users not
> to
> > set the properties in nutch-default but where it should, in nutch-site.
> > Thoughts?
> >
> > Cheers,
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Precopy http.agent properties to nutch-site

Posted by Markus Jelsma <ma...@openindex.io>.
Bump. Thoughts?

On Thursday 14 April 2011 12:49:54 Markus Jelsma wrote:
> Hi guys,
> 
> Maybe a last convenience would be to precopy the mandatory http.agent
> properties to nutch-site. This would, in my opinion, encourage users not to
> set the properties in nutch-default but where it should, in nutch-site.
> Thoughts?
> 
> Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350