You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Chesnay Schepler <ch...@apache.org> on 2017/06/07 09:15:01 UTC

[DISCUSS] Removal of twitter-inputformat

Hello,

I'm proposing to remove the Twitter-InputFormat in FLINK-6710 
<https://issues.apache.org/jira/browse/FLINK-6710>, with an open PR you 
can find here <https://github.com/apache/flink/pull/3984>.
The PR currently has a +1 from Robert, but Timo raised some concerns 
saying that it is useful for prototyping and
advised me to start a discussion on the ML.

This format is a DelimitedInputFormat that reads JSON objects and turns 
them into a custom tweet class.
I believe this format doesn't provide much value to Flink; there's 
nothing interesting about it as an InputFormat,
as it is purely an exercise in /manually /converting a JSON object into 
a POJO.
This is apparent since you could just as well use 
ExecutionEnvironment#readTextFile(...) and throw the parsing logic
into a subsequent MapFunction.

In the PR i suggested to replace this with a JsonInputFormat, but this 
was a misguided attempt at getting Timo to agree
to the removal. This format has the same problem outlined above, as it 
could be effectively implemented with a one-liner map function.

So the question now is whether we want to keep it, remove it, or replace 
it with something more general.

Regards,
Chesnay

Re: [DISCUSS] Removal of twitter-inputformat

Posted by sblackmon <sb...@apache.org>.
Hello,

Apache Streams (incubating) maintains and publishes json-schemas and jackson-compatible POJOs for Twitter and other popular third-party APIs.

http://streams.apache.org/site/0.5.1-incubating-SNAPSHOT/streams-project/streams-contrib/streams-provider-twitter/index.html

We also have a repository of public examples, one of which demonstrates how to embed various twitter data collectors into Flink.

http://streams.apache.org/site/0.5.1-incubating-SNAPSHOT/streams-examples/streams-examples-flink/flink-twitter-collection/index.html

We’d welcome support of anyone from Flink project to help us maintain and improve these examples.  Potentially, Flink could maintain the benefit of the existence of useful, ready-to-run examples for new Flink users, while getting the boring code out of your code base.  Also, our examples have integration tests that actually connect to twitter and check that everything continues to work :)

if anyone wants to know more about this, feel free to reach out to the team on dev@streams.incubator.apache.org

Steve
sblackmon@apache.org
On June 12, 2017 at 7:18:08 AM, Aljoscha Krettek (aljoscha@apache.org) wrote:

Bumpety-bump.  

I would be in favour or removing this:  
- It can be implemented as a MapFunction parser after a TextInputFormat  
- Additions, changes, fixes that happen on TextInputFormat are not reflected to SimpleTweetInputFormat  
- SimpleTweetInput format overrides nextRecord(), which is not something DelimitedInputFormats are normally supposed to do, I think  
- The Tweet POJO has a very strange naming scheme  

Best,  
Aljoscha  

> On 7. Jun 2017, at 11:15, Chesnay Schepler <ch...@apache.org> wrote:  
>  
> Hello,  
>  
> I'm proposing to remove the Twitter-InputFormat in FLINK-6710 <https://issues.apache.org/jira/browse/FLINK-6710>, with an open PR you can find here <https://github.com/apache/flink/pull/3984>.  
> The PR currently has a +1 from Robert, but Timo raised some concerns saying that it is useful for prototyping and  
> advised me to start a discussion on the ML.  
>  
> This format is a DelimitedInputFormat that reads JSON objects and turns them into a custom tweet class.  
> I believe this format doesn't provide much value to Flink; there's nothing interesting about it as an InputFormat,  
> as it is purely an exercise in manually converting a JSON object into a POJO.  
> This is apparent since you could just as well use ExecutionEnvironment#readTextFile(...) and throw the parsing logic  
> into a subsequent MapFunction.  
>  
> In the PR i suggested to replace this with a JsonInputFormat, but this was a misguided attempt at getting Timo to agree  
> to the removal. This format has the same problem outlined above, as it could be effectively implemented with a one-liner map function.  
>  
> So the question now is whether we want to keep it, remove it, or replace it with something more general.  
>  
> Regards,  
> Chesnay  


Re: [DISCUSS] Removal of twitter-inputformat

Posted by sblackmon <sb...@apache.org>.
Hello,

Apache Streams (incubating) maintains and publishes json-schemas and jackson-compatible POJOs for Twitter and other popular third-party APIs.

http://streams.apache.org/site/0.5.1-incubating-SNAPSHOT/streams-project/streams-contrib/streams-provider-twitter/index.html

We also have a repository of public examples, one of which demonstrates how to embed various twitter data collectors into Flink.

http://streams.apache.org/site/0.5.1-incubating-SNAPSHOT/streams-examples/streams-examples-flink/flink-twitter-collection/index.html

We’d welcome support of anyone from Flink project to help us maintain and improve these examples.  Potentially, Flink could maintain the benefit of the existence of useful, ready-to-run examples for new Flink users, while getting the boring code out of your code base.  Also, our examples have integration tests that actually connect to twitter and check that everything continues to work :)

if anyone wants to know more about this, feel free to reach out to the team on dev@streams.incubator.apache.org

Steve
sblackmon@apache.org
On June 12, 2017 at 7:18:08 AM, Aljoscha Krettek (aljoscha@apache.org) wrote:

Bumpety-bump.  

I would be in favour or removing this:  
- It can be implemented as a MapFunction parser after a TextInputFormat  
- Additions, changes, fixes that happen on TextInputFormat are not reflected to SimpleTweetInputFormat  
- SimpleTweetInput format overrides nextRecord(), which is not something DelimitedInputFormats are normally supposed to do, I think  
- The Tweet POJO has a very strange naming scheme  

Best,  
Aljoscha  

> On 7. Jun 2017, at 11:15, Chesnay Schepler <ch...@apache.org> wrote:  
>  
> Hello,  
>  
> I'm proposing to remove the Twitter-InputFormat in FLINK-6710 <https://issues.apache.org/jira/browse/FLINK-6710>, with an open PR you can find here <https://github.com/apache/flink/pull/3984>.  
> The PR currently has a +1 from Robert, but Timo raised some concerns saying that it is useful for prototyping and  
> advised me to start a discussion on the ML.  
>  
> This format is a DelimitedInputFormat that reads JSON objects and turns them into a custom tweet class.  
> I believe this format doesn't provide much value to Flink; there's nothing interesting about it as an InputFormat,  
> as it is purely an exercise in manually converting a JSON object into a POJO.  
> This is apparent since you could just as well use ExecutionEnvironment#readTextFile(...) and throw the parsing logic  
> into a subsequent MapFunction.  
>  
> In the PR i suggested to replace this with a JsonInputFormat, but this was a misguided attempt at getting Timo to agree  
> to the removal. This format has the same problem outlined above, as it could be effectively implemented with a one-liner map function.  
>  
> So the question now is whether we want to keep it, remove it, or replace it with something more general.  
>  
> Regards,  
> Chesnay  


Re: [DISCUSS] Removal of twitter-inputformat

Posted by Aljoscha Krettek <al...@apache.org>.
Bumpety-bump.

I would be in favour or removing this:
 - It can be implemented as a MapFunction parser after a TextInputFormat
 - Additions, changes, fixes that happen on TextInputFormat are not reflected to SimpleTweetInputFormat
 - SimpleTweetInput format overrides nextRecord(), which is not something DelimitedInputFormats are normally supposed to do, I think
 - The Tweet POJO has a very strange naming scheme

Best,
Aljoscha

> On 7. Jun 2017, at 11:15, Chesnay Schepler <ch...@apache.org> wrote:
> 
> Hello,
> 
> I'm proposing to remove the Twitter-InputFormat in FLINK-6710 <https://issues.apache.org/jira/browse/FLINK-6710>, with an open PR you can find here <https://github.com/apache/flink/pull/3984>.
> The PR currently has a +1 from Robert, but Timo raised some concerns saying that it is useful for prototyping and
> advised me to start a discussion on the ML.
> 
> This format is a DelimitedInputFormat that reads JSON objects and turns them into a custom tweet class.
> I believe this format doesn't provide much value to Flink; there's nothing interesting about it as an InputFormat,
> as it is purely an exercise in manually converting a JSON object into a POJO.
> This is apparent since you could just as well use ExecutionEnvironment#readTextFile(...) and throw the parsing logic
> into a subsequent MapFunction.
> 
> In the PR i suggested to replace this with a JsonInputFormat, but this was a misguided attempt at getting Timo to agree
> to the removal. This format has the same problem outlined above, as it could be effectively implemented with a one-liner map function.
> 
> So the question now is whether we want to keep it, remove it, or replace it with something more general.
> 
> Regards,
> Chesnay


Re: [DISCUSS] Removal of twitter-inputformat

Posted by Aljoscha Krettek <al...@apache.org>.
Bumpety-bump.

I would be in favour or removing this:
 - It can be implemented as a MapFunction parser after a TextInputFormat
 - Additions, changes, fixes that happen on TextInputFormat are not reflected to SimpleTweetInputFormat
 - SimpleTweetInput format overrides nextRecord(), which is not something DelimitedInputFormats are normally supposed to do, I think
 - The Tweet POJO has a very strange naming scheme

Best,
Aljoscha

> On 7. Jun 2017, at 11:15, Chesnay Schepler <ch...@apache.org> wrote:
> 
> Hello,
> 
> I'm proposing to remove the Twitter-InputFormat in FLINK-6710 <https://issues.apache.org/jira/browse/FLINK-6710>, with an open PR you can find here <https://github.com/apache/flink/pull/3984>.
> The PR currently has a +1 from Robert, but Timo raised some concerns saying that it is useful for prototyping and
> advised me to start a discussion on the ML.
> 
> This format is a DelimitedInputFormat that reads JSON objects and turns them into a custom tweet class.
> I believe this format doesn't provide much value to Flink; there's nothing interesting about it as an InputFormat,
> as it is purely an exercise in manually converting a JSON object into a POJO.
> This is apparent since you could just as well use ExecutionEnvironment#readTextFile(...) and throw the parsing logic
> into a subsequent MapFunction.
> 
> In the PR i suggested to replace this with a JsonInputFormat, but this was a misguided attempt at getting Timo to agree
> to the removal. This format has the same problem outlined above, as it could be effectively implemented with a one-liner map function.
> 
> So the question now is whether we want to keep it, remove it, or replace it with something more general.
> 
> Regards,
> Chesnay