You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streams.apache.org by "Matthew Hager [W2O Digital]" <mh...@w2odigital.com> on 2014/05/13 19:39:46 UTC

Streams-Processor-URLs

Team,

I am looking to expand streams-processor-urls to include pulling content from the page, determining the type of content that is, and extracting as much meta data as possible from the page. Would anyone have any objections to that being placed in the same package as there will be a lot of 'overlap' between helper functions and dependencies.

If not, I'll create the stories and start working on it.

Thank you for your time!

Thanks!
Matthew

Re: Streams-Processor-URLs

Posted by "Matthew Hager [W2O Digital]" <mh...@w2odigital.com>.
Alright, got Any23 working... I can probably use this and have Tika /
Boilerpipes be a fall-back. I'll experiment. If I come up with anything.
I'll implement it in my project first, test it out, then build some
'issues' and push it back to streams.





On 5/13/14, 9:21 PM, "Steve Blackmon" <sb...@apache.org> wrote:

>No objections, that will be a great feature.  Apache Any23 may be of
>interest since it contains a growing catalog of common microformats.
>
>Steve Blackmon
>sblackmon@apache.org
>
>
>On Tue, May 13, 2014 at 12:39 PM, Matthew Hager [W2O Digital]
><mh...@w2odigital.com> wrote:
>> Team,
>>
>> I am looking to expand streams-processor-urls to include pulling
>>content from the page, determining the type of content that is, and
>>extracting as much meta data as possible from the page. Would anyone
>>have any objections to that being placed in the same package as there
>>will be a lot of 'overlap' between helper functions and dependencies.
>>
>> If not, I'll create the stories and start working on it.
>>
>> Thank you for your time!
>>
>> Thanks!
>> Matthew


Re: Streams-Processor-URLs

Posted by Steve Blackmon <sb...@apache.org>.
No objections, that will be a great feature.  Apache Any23 may be of
interest since it contains a growing catalog of common microformats.

Steve Blackmon
sblackmon@apache.org


On Tue, May 13, 2014 at 12:39 PM, Matthew Hager [W2O Digital]
<mh...@w2odigital.com> wrote:
> Team,
>
> I am looking to expand streams-processor-urls to include pulling content from the page, determining the type of content that is, and extracting as much meta data as possible from the page. Would anyone have any objections to that being placed in the same package as there will be a lot of 'overlap' between helper functions and dependencies.
>
> If not, I'll create the stories and start working on it.
>
> Thank you for your time!
>
> Thanks!
> Matthew