You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lyndon Maydwell <ma...@gmail.com> on 2007/12/06 08:11:23 UTC

url normalization

Is there a way to apply regex normalization on the urls currently in
the database?

e.g. I would like to make www.asdf.com equivalent to asdf.com