You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Yan Fang (JIRA)" <ji...@apache.org> on 2014/04/11 23:20:15 UTC

[jira] [Updated] (SAMZA-235) Add internal input stream for hello-samza

     [ https://issues.apache.org/jira/browse/SAMZA-235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yan Fang updated SAMZA-235:
---------------------------

    Description: 
As reported by Sonali and Yan Fang, some corporations blocks IRC service/port. So they will not be able to run the hello-samza successfully. http://mail-archives.apache.org/mod_mbox/samza-dev/201403.mbox/%3CB84B01583BEBBC45AD442B3F9045B8AC0ED46F80@048-CH1MPN3-331.048d.mgd.msft.net%3E

As suggested by [~jghoman] and [~criccomini] , we should add internal input stream for hello-samza as an alternative. There are two ways:
1. use simulate/fake data. 
2. use local environment related data.

I lean to the first approach. We can simulate wikimedia data (though it is a little boring). Because it can reuse the WikipediaParserStreamTask and WikipediaStatsStreamTask. Another reason is, since we use simulate data, the output is very predictable, that will help bring hello-samza to integration test stated in SAMZA-205 .

In addition, if we use FS reader in SAMZA-138 , that will also be a good example for writing SystemFactory (besides the out-of-box KafkaSystemFactory).



  was:
As reported by Sonali and Yan Fang, some corporations blocks IRC service/port. So they will not be able to run the hello-samza successfully. http://mail-archives.apache.org/mod_mbox/samza-dev/201403.mbox/%3CB84B01583BEBBC45AD442B3F9045B8AC0ED46F80@048-CH1MPN3-331.048d.mgd.msft.net%3E

As suggested by Jakob Homan and Chris Riccomini , we should add internal input stream for hello-samza as an alternative. There are two ways:
1. use simulate/fake data. 
2. use local environment related data.

I lean to the first approach. We can simulate wikimedia data (though it is a little boring). Because it can reuse the WikipediaParserStreamTask and WikipediaStatsStreamTask. Another reason is, since we use simulate data, the output is very predictable, that will help bring hello-samza to integration test stated in SAMZA-205 .

In addition, if we use FS reader in SAMZA-138 , that will also be a good example for writing SystemFactory (besides the out-of-box KafkaSystemFactory).




> Add internal input stream for hello-samza
> -----------------------------------------
>
>                 Key: SAMZA-235
>                 URL: https://issues.apache.org/jira/browse/SAMZA-235
>             Project: Samza
>          Issue Type: Improvement
>          Components: hello-samza
>            Reporter: Yan Fang
>
> As reported by Sonali and Yan Fang, some corporations blocks IRC service/port. So they will not be able to run the hello-samza successfully. http://mail-archives.apache.org/mod_mbox/samza-dev/201403.mbox/%3CB84B01583BEBBC45AD442B3F9045B8AC0ED46F80@048-CH1MPN3-331.048d.mgd.msft.net%3E
> As suggested by [~jghoman] and [~criccomini] , we should add internal input stream for hello-samza as an alternative. There are two ways:
> 1. use simulate/fake data. 
> 2. use local environment related data.
> I lean to the first approach. We can simulate wikimedia data (though it is a little boring). Because it can reuse the WikipediaParserStreamTask and WikipediaStatsStreamTask. Another reason is, since we use simulate data, the output is very predictable, that will help bring hello-samza to integration test stated in SAMZA-205 .
> In addition, if we use FS reader in SAMZA-138 , that will also be a good example for writing SystemFactory (besides the out-of-box KafkaSystemFactory).



--
This message was sent by Atlassian JIRA
(v6.2#6252)