You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2013/10/07 16:55:42 UTC

[jira] [Created] (NUTCH-1652) Avoid instanciation of MimeUtil for each Content object created

Julien Nioche created NUTCH-1652:
------------------------------------

             Summary: Avoid instanciation of MimeUtil for each Content object created
                 Key: NUTCH-1652
                 URL: https://issues.apache.org/jira/browse/NUTCH-1652
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.7
            Reporter: Julien Nioche


Content objects instantiate and hold a MimeUtil in the constructor used by the HttpBase class. This is wasteful and unnecessarily slows down the creation of Content object as the MimeUtil creates a new Tika instance, reads from the configuration etc...

Instead we could create a single instance of the MimeUtil class and pass it to the a new Content constructor   

{code}
public Content(String url, String base, byte[] content, String contentType,
      Metadata metadata, MimeUtil mime)
{code}

and create a single instance of MimeUtil in HttpBase. We would also need to make sure that the synchronisation is handled properly in MimeUtil (especially for the calls to Tika) as the creation of the Content is done in a multithreaded environment.




--
This message was sent by Atlassian JIRA
(v6.1#6144)