You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Sergey Beryozkin (JIRA)" <ji...@apache.org> on 2017/06/01 13:06:04 UTC

[jira] [Comment Edited] (BEAM-2328) Introduce Apache Tika Input component

    [ https://issues.apache.org/jira/browse/BEAM-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032904#comment-16032904 ] 

Sergey Beryozkin edited comment on BEAM-2328 at 6/1/17 1:05 PM:
----------------------------------------------------------------

Sorry, Tika already reports the characters, I got confused for a moment that the default output coder was not used there but of course that output coder is for converting String to the output...
As far as Tika is concerned it is already possible to pass the custom Metadata to TikaInput.Read, I'll just update that to also accept TikaConfg 


was (Author: sergey_beryozkin):
Sorry, Tika already reports the characters...

> Introduce Apache Tika Input component
> -------------------------------------
>
>                 Key: BEAM-2328
>                 URL: https://issues.apache.org/jira/browse/BEAM-2328
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-ideas, sdk-java-extensions
>            Reporter: Sergey Beryozkin
>            Assignee: Sergey Beryozkin
>             Fix For: 2.1.0
>
>
> Apache Tika is a popular project that offers an extensive support for parsing the variety of file formats. It is used in many projects including Lucene and Elastic Search. 
> Supporting a Tika Input (Read) at the Beam level would be of major interest to many users.
> PR is to follow



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)