You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Yan Fang (JIRA)" <ji...@apache.org> on 2014/04/18 02:12:16 UTC

[jira] [Commented] (SAMZA-138) System that places specified file contents onto stream

    [ https://issues.apache.org/jira/browse/SAMZA-138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973605#comment-13973605 ] 

Yan Fang commented on SAMZA-138:
--------------------------------

Since SAMZA-235 needs file reader, I pick up this ticket. Thank [~poltak], he made my work easier.

For the improvements:
{quote}
handle reading from multiple files (suggested alternative input specification- point 2)
{quote}
Done.

{quote}
use of filepos for IncomingMessageEnvelope offset (more info here
{quote}
Hi [~criccomini] , I need some help here. In order to have the samza job pick up from offset, I also need to implement a new SystemAdmin in FileReaderSystemFactory. It seems that I can not reuse the KafkaSystemAdmin directly. And reading SinglePartitionWithoutOffsetsSystemAdmin does not help a lot. So my questions are:
* what is the exactly goal of SystemAdmin? 
* Does getSystemStreamMetadata method need to talk directly with Kafka to get the metadata? I am a little confused with CheckPointManager here.

{quote}
come up with a reasonable bounded queue threshold (the value of 100 was arbitrary, as I was unsure of a reasonable value here)
{quote}
It is 100 by default. I plan not to change it. Will SAMZA-220 affect this?

{quote}
better handling for the exceptions encountered (I wasn't 100% sure about some of them)
{quote}
I wrap all the exceptions to SamzaException. Whenever there is a SamzaException, will the Samza job be killed?


 

> System that places specified file contents onto stream
> ------------------------------------------------------
>
>                 Key: SAMZA-138
>                 URL: https://issues.apache.org/jira/browse/SAMZA-138
>             Project: Samza
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>         Environment: RHELinux 2.6.18-371.4.1.el5
>            Reporter: Jonathan Poltak Samosir
>            Priority: Minor
>              Labels: feature, newbie, patch
>         Attachments: FileReaderConsumer.java, FileReaderSystemFactory.java
>
>
> A fairly straightforward Samza System that reads from a specified file, and places that file's contents onto a SystemStreamPartition for use as input for a StreamTask.
> Roughly based off how the hello-samza example project's WikipediaSystem works (more the SystemConsumerFactory rather than SystemConsumer class). 
> Probably needs a bit of work, but basic functionality works as intended. Hopefully useful to some, either as a functioning system or as a base for a more robust and functionally-promising system that you wish to implement.
> Some suggested improvements (not yet implemented):
> * handle reading from multiple files ([suggested alternative input specification|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA7465%40ESV4-MBX01.linkedin.biz%3E]- point 2)
> * use of filepos for IncomingMessageEnvelope offset ([more info here|https://mail-archives.apache.org/mod_mbox/incubator-samza-dev/201401.mbox/%3C1B43C7411DB20E47AB0FB62E7262B80179BA749D%40ESV4-MBX01.linkedin.biz%3E]
> * come up with a reasonable bounded queue threshold (the value of 100 was arbitrary, as I was unsure of a reasonable value here) 
> * better handling for the exceptions encountered (I wasn't 100% sure about some of them)



--
This message was sent by Atlassian JIRA
(v6.2#6252)