You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/09/23 19:47:50 UTC

[jira] Closed: (TIKA-20) A convenience method for getting a document's text in a single method would be helpful.

     [ https://issues.apache.org/jira/browse/TIKA-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann closed TIKA-20.
---------------------------------

    Resolution: Fixed

Issue is addressed in patch for TIKA-17

> A convenience method for getting a document's text in a single method would be helpful.
> ---------------------------------------------------------------------------------------
>
>                 Key: TIKA-20
>                 URL: https://issues.apache.org/jira/browse/TIKA-20
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>
> A convenience method for getting a document's text in a single method would be helpful.
> This would address the common use case of wanting the string content, but not the document metadata.
> Sample methods are below:
> ------------------------------------------------------------------ 
>     /** 
>      * Gets the full text (but not other properties of the document 
>      * at the specified URL. 
>      * 
>      * @param documentUrl URL of the resource to parse 
>      * @param configUrl url of Tika configuration object 
>      * @return the document's full text 
>      */ 
>     public static String getStrContent(URL documentUrl, URL configUrl) 
>             throws LiusException, IOException { 
>         return getStrContent(documentUrl, 
>                 LiusConfig.getInstance(configUrl)); 
>     } 
>     /** 
>      * Gets the full text (but not other properties of the document 
>      * at the specified URL. 
>      * 
>      * @param documentUrl URL of the resource to parse 
>      * @param config Tika configuration object 
>      * @return the document's full text 
>      */ 
>     public static String getStrContent(URL documentUrl, LiusConfig config) 
>             throws LiusException, IOException { 
>         String fulltext = null; 
>         if (documentUrl != null) { 
>             Parser parser = ParserFactory.getParser(documentUrl, config); 
>             fulltext = parser.getStrContent(); 
>         } 
>         return fulltext; 
>     } 
> =========================
> This code assumes changes to the code base that are not (yet) committed that will enable us to use URL's for input document specifiers.  (See TIKA-17.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.