You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2012/01/30 14:12:32 UTC

application/xhtml+xml => text/html

Hi,

Should we not provide an optional replace for the content type field in index-
more? They are the same for end-users but end up differently in an index.

Thoughts?
Thanks

Re: application/xhtml+xml => text/html

Posted by Markus Jelsma <ma...@openindex.io>.
Created issue:
https://issues.apache.org/jira/browse/NUTCH-1262

On Tuesday 31 January 2012 06:58:56 Alexander Aristov wrote:
> Hi
> 
> Of course we all understand that these two types are not the same and serve
> for different purposes but since Nutch doesn't make difference between them
> it would be possible and reasonable to make content-type the same.
> 
> But there are might be some problems. Some nutch users might rely on
> content-type and apply special parser for application/xhtml+xml,
> considering maybe additional namespaces.
> 
> Of course for indexing and searching it replacement would be good.
> 
> 
> in fact there many other examples when content type of different types can
> be treated in the smae way and what if we had a feature of grouping several
> content types into single?
> 
> Best Regards
> Alexander Aristov
> 
> On 30 January 2012 17:12, Markus Jelsma <ma...@openindex.io> wrote:
> > Hi,
> > 
> > Should we not provide an optional replace for the content type field in
> > index-
> > more? They are the same for end-users but end up differently in an index.
> > 
> > Thoughts?
> > Thanks

-- 
Markus Jelsma - CTO - Openindex

Re: application/xhtml+xml => text/html

Posted by Alexander Aristov <al...@gmail.com>.
Hi

Of course we all understand that these two types are not the same and serve
for different purposes but since Nutch doesn't make difference between them
it would be possible and reasonable to make content-type the same.

But there are might be some problems. Some nutch users might rely on
content-type and apply special parser for application/xhtml+xml,
considering maybe additional namespaces.

Of course for indexing and searching it replacement would be good.


in fact there many other examples when content type of different types can
be treated in the smae way and what if we had a feature of grouping several
content types into single?

Best Regards
Alexander Aristov


On 30 January 2012 17:12, Markus Jelsma <ma...@openindex.io> wrote:

> Hi,
>
> Should we not provide an optional replace for the content type field in
> index-
> more? They are the same for end-users but end up differently in an index.
>
> Thoughts?
> Thanks
>