You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/04/14 12:49:54 UTC
Precopy http.agent properties to nutch-site
Hi guys,
Maybe a last convenience would be to precopy the mandatory http.agent
properties to nutch-site. This would, in my opinion, encourage users not to
set the properties in nutch-default but where it should, in nutch-site.
Thoughts?
Cheers,
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
Re: Precopy http.agent properties to nutch-site
Posted by Markus Jelsma <ma...@openindex.io>.
Yes, it makes sense to provide a working set up. But since http.agent.*
properties are dependant on the user, what values would be sensible? At least
not a value that would indicate that nutch.apache.org operates the crawler.
On Tuesday 26 April 2011 19:24:33 Susam Pal wrote
>
> I would suggest that these properties are set to sensible values in
> 'conf/nutch-default.xml' itself. I have found it inconvenient to
> override these properties every time I have installed Nutch. IMHO it
> would be good to have a working configuration available with the
> source code and distribution.
>
> Regards,
> Susam Pal
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
Re: Precopy http.agent properties to nutch-site
Posted by Susam Pal <su...@gmail.com>.
On Tue, Apr 26, 2011 at 10:38 PM, Markus Jelsma
<ma...@openindex.io> wrote:
> Hi,
>
> Of course, but since the agent.* params are mandatory (fetcher will abort when
> not specified) we could then add to the error message that these params (like
> all) must be set in nutch-site. New users would then keep using nutch-site, at
> least that's the idea ;)
>
> I think that if nutch-default is set to read-only, users that try to modify
> will indeed immediately change to write permission and continue to use the
> wrong config.
>
> Cheers,
>
>> Hi Markus
>>
>> Any param overridden by the users should be in nutch-site.xml, not just
>> http.agent, so why make an exception for it? Moreover that will not
>> necessarily prevent people from using nutch-default.xml
>>
>> Maybe we could set nutch-default to readonly? Could be changed by the user
>> but this might nudge them in the right direction
>>
>> Julien
>>
>> On 26 April 2011 16:55, Markus Jelsma <ma...@openindex.io> wrote:
>> > Bump. Thoughts?
>> >
>> > On Thursday 14 April 2011 12:49:54 Markus Jelsma wrote:
>> > > Hi guys,
>> > >
>> > > Maybe a last convenience would be to precopy the mandatory http.agent
>> > > properties to nutch-site. This would, in my opinion, encourage users
>> > > not
>> >
>> > to
>> >
>> > > set the properties in nutch-default but where it should, in nutch-site.
>> > > Thoughts?
>> > >
>> > > Cheers,
>> >
>> > --
>> > Markus Jelsma - CTO - Openindex
>> > http://www.linkedin.com/in/markus17
>> > 050-8536620 / 06-50258350
>
I would suggest that these properties are set to sensible values in
'conf/nutch-default.xml' itself. I have found it inconvenient to
override these properties every time I have installed Nutch. IMHO it
would be good to have a working configuration available with the
source code and distribution.
Regards,
Susam Pal
Re: Precopy http.agent properties to nutch-site
Posted by Markus Jelsma <ma...@openindex.io>.
Hi,
Of course, but since the agent.* params are mandatory (fetcher will abort when
not specified) we could then add to the error message that these params (like
all) must be set in nutch-site. New users would then keep using nutch-site, at
least that's the idea ;)
I think that if nutch-default is set to read-only, users that try to modify
will indeed immediately change to write permission and continue to use the
wrong config.
Cheers,
> Hi Markus
>
> Any param overridden by the users should be in nutch-site.xml, not just
> http.agent, so why make an exception for it? Moreover that will not
> necessarily prevent people from using nutch-default.xml
>
> Maybe we could set nutch-default to readonly? Could be changed by the user
> but this might nudge them in the right direction
>
> Julien
>
> On 26 April 2011 16:55, Markus Jelsma <ma...@openindex.io> wrote:
> > Bump. Thoughts?
> >
> > On Thursday 14 April 2011 12:49:54 Markus Jelsma wrote:
> > > Hi guys,
> > >
> > > Maybe a last convenience would be to precopy the mandatory http.agent
> > > properties to nutch-site. This would, in my opinion, encourage users
> > > not
> >
> > to
> >
> > > set the properties in nutch-default but where it should, in nutch-site.
> > > Thoughts?
> > >
> > > Cheers,
> >
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350
Re: Precopy http.agent properties to nutch-site
Posted by Julien Nioche <li...@gmail.com>.
Hi Markus
Any param overridden by the users should be in nutch-site.xml, not just
http.agent, so why make an exception for it? Moreover that will not
necessarily prevent people from using nutch-default.xml
Maybe we could set nutch-default to readonly? Could be changed by the user
but this might nudge them in the right direction
Julien
On 26 April 2011 16:55, Markus Jelsma <ma...@openindex.io> wrote:
> Bump. Thoughts?
>
> On Thursday 14 April 2011 12:49:54 Markus Jelsma wrote:
> > Hi guys,
> >
> > Maybe a last convenience would be to precopy the mandatory http.agent
> > properties to nutch-site. This would, in my opinion, encourage users not
> to
> > set the properties in nutch-default but where it should, in nutch-site.
> > Thoughts?
> >
> > Cheers,
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
Re: Precopy http.agent properties to nutch-site
Posted by Markus Jelsma <ma...@openindex.io>.
Bump. Thoughts?
On Thursday 14 April 2011 12:49:54 Markus Jelsma wrote:
> Hi guys,
>
> Maybe a last convenience would be to precopy the mandatory http.agent
> properties to nutch-site. This would, in my opinion, encourage users not to
> set the properties in nutch-default but where it should, in nutch-site.
> Thoughts?
>
> Cheers,
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350