You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by lewis john mcgibbney <le...@apache.org> on 2021/01/24 21:37:46 UTC

CVE-2021-23901: An XML external entity (XXE) injection vulnerability exists in the Nutch DmozParser

Description:

An XML external entity (XXE) injection vulnerability was discovered in the
Nutch DmozParser and is known to affect Nutch versions < 1.18. XML external
entity injection (also known as XXE) is a web security vulnerability that
allows an attacker to interfere with an application's processing of XML
data. It often allows an attacker to view files on the application server
filesystem, and to interact with any back-end or external systems that the
application itself can access.


This issue is being tracked as NUTCH-2841

Credit:

The Apache Nutch Project Management Committee would like to thank Martin
Heyden for reporting this issue to the Apache Security Team. We are
indebted.



--
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Re: CVE-2021-23901: An XML external entity (XXE) injection vulnerability exists in the Nutch DmozParser

Posted by Sebastian Nagel <wa...@googlemail.com>.
 > Do we still need the DMOZ parser?

DMOZ is now offline since 3 years [1] and none of the projects claiming to be successors [2,3]
provides the RDF dumps required as input for the DMOZ parser.

It soon will become a legacy tool and we might think whether it's better to remove it.

I remember that 4 years ago I've used DMOZ to seed a crawl of news sites from all over the world.
The coverage of DMOZ was definitely good at this time. But it's clear: it will degrade. And it's
not easy to find a copy of the dumps.

Sebastian

[1] https://en.wikipedia.org/wiki/DMOZ
[2] https://curlie.org/docs/en/rdf.html
[3] http://dmoztools.net/docs/en/rdf.html

On 1/25/21 12:04 PM, BlackIce wrote:
> Do we still need the DMOZ parser?
> 
> On Sun, Jan 24, 2021 at 10:38 PM lewis john mcgibbney
> <le...@apache.org> wrote:
>>
>> Description:
>>
>> An XML external entity (XXE) injection vulnerability was discovered in the Nutch DmozParser and is known to affect Nutch versions < 1.18. XML external entity injection (also known as XXE) is a web security vulnerability that allows an attacker to interfere with an application's processing of XML data. It often allows an attacker to view files on the application server filesystem, and to interact with any back-end or external systems that the application itself can access.
>>
>>
>> This issue is being tracked as NUTCH-2841
>>
>> Credit:
>>
>> The Apache Nutch Project Management Committee would like to thank Martin Heyden for reporting this issue to the Apache Security Team. We are indebted.
>>
>>
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc


Re: CVE-2021-23901: An XML external entity (XXE) injection vulnerability exists in the Nutch DmozParser

Posted by BlackIce <bl...@gmail.com>.
Do we still need the DMOZ parser?

On Sun, Jan 24, 2021 at 10:38 PM lewis john mcgibbney
<le...@apache.org> wrote:
>
> Description:
>
> An XML external entity (XXE) injection vulnerability was discovered in the Nutch DmozParser and is known to affect Nutch versions < 1.18. XML external entity injection (also known as XXE) is a web security vulnerability that allows an attacker to interfere with an application's processing of XML data. It often allows an attacker to view files on the application server filesystem, and to interact with any back-end or external systems that the application itself can access.
>
>
> This issue is being tracked as NUTCH-2841
>
> Credit:
>
> The Apache Nutch Project Management Committee would like to thank Martin Heyden for reporting this issue to the Apache Security Team. We are indebted.
>
>
>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc