You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2007/09/23 19:47:50 UTC
[jira] Closed: (TIKA-20) A convenience method for getting a
document's text in a single method would be helpful.
[ https://issues.apache.org/jira/browse/TIKA-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann closed TIKA-20.
---------------------------------
Resolution: Fixed
Issue is addressed in patch for TIKA-17
> A convenience method for getting a document's text in a single method would be helpful.
> ---------------------------------------------------------------------------------------
>
> Key: TIKA-20
> URL: https://issues.apache.org/jira/browse/TIKA-20
> Project: Tika
> Issue Type: New Feature
> Components: general
> Affects Versions: 0.1-incubator
> Reporter: Keith R. Bennett
> Priority: Minor
> Fix For: 0.1-incubator
>
>
> A convenience method for getting a document's text in a single method would be helpful.
> This would address the common use case of wanting the string content, but not the document metadata.
> Sample methods are below:
> ------------------------------------------------------------------
> /**
> * Gets the full text (but not other properties of the document
> * at the specified URL.
> *
> * @param documentUrl URL of the resource to parse
> * @param configUrl url of Tika configuration object
> * @return the document's full text
> */
> public static String getStrContent(URL documentUrl, URL configUrl)
> throws LiusException, IOException {
> return getStrContent(documentUrl,
> LiusConfig.getInstance(configUrl));
> }
> /**
> * Gets the full text (but not other properties of the document
> * at the specified URL.
> *
> * @param documentUrl URL of the resource to parse
> * @param config Tika configuration object
> * @return the document's full text
> */
> public static String getStrContent(URL documentUrl, LiusConfig config)
> throws LiusException, IOException {
> String fulltext = null;
> if (documentUrl != null) {
> Parser parser = ParserFactory.getParser(documentUrl, config);
> fulltext = parser.getStrContent();
> }
> return fulltext;
> }
> =========================
> This code assumes changes to the code base that are not (yet) committed that will enable us to use URL's for input document specifiers. (See TIKA-17.)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.