You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apex.apache.org by David Yan <da...@datatorrent.com> on 2015/12/23 22:42:17 UTC

High level API: Request for ideas

Hi fellow Apex developers:

Apex has a comprehensive API for constructing DAG topologies for streaming
applications, using operators, ports and streams. But this may seem too
much for folks who just want to build simple applications, or just to learn
about Apex. For example, when you compare the code to do word count in
Apex with Spark Streaming or Flink, Apex requires much more code.

Apex:
https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main

Spark Streaming:
https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java

Flink:
https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java

Note that their Scala versions are even simpler to use.

The high-level requirements I have in mind is as follow:

1. A simple-to-use high-level API similar to what Spark Streaming and Flink
have. And from the high-level API, the Apex engine will construct the
actual DAG topology at launch time.

2. The first language we will support is Java, but we will also want to
support Scala and possibly Python at some point, so the high-level API
should make it easy for implementing bindings for at least these two
languages.

3. We should be able to use the high-level API in Apex App Package (apa)
file, so that dtcli can launch it just like a regular apa today.

Please provide your ideas and thoughts on this topic.

Thanks,

David

Re: High level API: Request for ideas

Posted by Tushar Gosavi <tu...@datatorrent.com>.

+1 for Scala stream like api.

I would like to help developing Scala like API.  I was doing prototype for
similar api in scala to generate linear DAG.  I was facing problem
designing stream like
api to generate arbitrary DAG, that our platform supports. and support all
functionality of the the platform, like specifying attributes, adding
custom operators to DAG.

The repository is :
https://github.com/tushargosavi/apexstream

A simple wordcount application is, With lambda function support, it is very
easy to write simple applications.

```java
@ApplicationAnnotation(name="TestApp")
class TestApp extends StreamingApplication {

  override def populateDAG(dag: DAG, conf: Configuration): Unit = {

    val ctx : DTContext = new DTContext(dag, conf)

    ctx.fileInput("/user/tushar/data") // this wraps FileInputOperator ,
      .flatMap(_.split(" ")) // split the word in split and emit individual
words
      .filter(_.length > 0) // filter out empty words
      .addOperator[java.util.Map[String, Int]](new UniqueCounter()) // use
malhar operator
      .flatMap(_.asScala.map(_.toString())) // convert map to key value
pairs
      .print()

    ctx.build
  }
}
```

The fileInput function is nothing but
```
  def fileInput(dir : String) : Stream[String] = {
    val op = new FileLineInputOperator()
    op.setDirectory(dir)
    source[String](op)
  }
```


The repository has basic support for map, flatMap, filter, reduce, count
support which works on application window boundaries.  We will also
need shell where user can quickly type application and submit it execution
from shell directly while testing.

Another approach is to adopt Fink DataStream API or Spark Stream API. But I
feel having similar but our own API which can allow specifying all
attributes, like
partitioner, streamcodec, other attributes will be a big +.


- Tushar.



On Thu, Dec 24, 2015 at 11:28 AM, Sandeep Deshmukh <sa...@datatorrent.com>
wrote:

> +1 on not coming up with our own APIs. We should adapt to existing ones  so
> that there is additional learning curve for Apex users.
>
> It is more of a functional way of specifying the DAG and a subset of scala
> could be a good starting point.
>
> Regards,
> Sandeep
>
> On Thu, Dec 24, 2015 at 5:21 AM, Siyuan Hua <si...@datatorrent.com>
> wrote:
>
> > My first suggestion is we should focus on Stream API(or change the name
> we
> > call it) for now.  High-level API is confusion and could be anything that
> > helps.
> >
> > Stream is in fact more well-known concept other than Operators, ports,
> > connector, etc. I think the idea originate from scala sequence API(
> > http://www.scala-lang.org/api/current/#scala.collection.Seq)
> > And the term "Stream" already implies some minimal function we need ex.
> > "map"(t1->f->t1'), "reduce" (t1, t2,...  -> f -> t1'), "filter" (t1
> ->f(if
> > true) -> t1)
> > We shouldn't come up with arbitrary things so the API would become
> > cumbersome and hard to learn.
> >
> >
> >
> >
> >
> >
> > On Wed, Dec 23, 2015 at 3:12 PM, Siyuan Hua <si...@datatorrent.com>
> > wrote:
> >
> > > Another API that could be a reference is
> > > http://storm.apache.org/documentation/Trident-API-Overview.html
> > >
> > > On Wed, Dec 23, 2015 at 3:09 PM, Ashwin Chandra Putta <
> > > ashwinchandrap@gmail.com> wrote:
> > >
> > >> // Made few edits, ignore previous mail. Read this instead.
> > >>
> > >> David,
> > >>
> > >> I can imagine that it boils down to something like these function
> calls.
> > >>
> > >> DtString lines = readLines(new LineReader());
> > >> DtString words = lines.split(new LineSplitter());
> > >> DtNumber count  = words.count(new Counter());
> > >> count.print(new ConoleOutputOperator());
> > >>
> > >> Or
> > >>
> > >> readLines(new LineReader())
> > >>   .split(new LineSplitter())
> > >>   .count(new Counter())
> > >>   .print(new ConoleOutputOperator());
> > >>
> > >>
> > >> which translates to
> > >>
> > >> Reader reader = dag.addOperator("ReadLines", new LineReader());
> > >> Splitter splitter = dag.addOperator("Split", new LineSplitter());
> > >> Counter counter = dag.addOperator("Count", new WordCounter());
> > >> ConsoleOutputOperator console = dag.addOperator("Print", new
> > >> ConsoleOutputOperator());
> > >>
> > >> dag.addStream("lines", reader.output, splitter.input);
> > >> dag.addStream("words", splitter.output, counter.input);
> > >> dag.addStream("count", counter.output, console.input);
> > >>
> > >> Here are my initial thoughts:
> > >>
> > >> For the higher level api to work, we need the following support at
> > least.
> > >>
> > >> 1. The operators used in the higher level api should have concrete
> > >> implementations with all available input and output ports defined at
> the
> > >> abstract level. Have to think more about how multiple output ports
> will
> > >> play out.
> > >> 2. We need to define the objects that have method calls available on
> > them
> > >> that take operators as parameters.
> > >>
> > >> Eg: DtString can have method split and takes Splitter operator. And
> > >> Splitter operator should be abstract with input port type DtString and
> > >> output port type DtString. LineSplitter will be a concrete
> > implementation
> > >> of this operator.
> > >>
> > >> Regards,
> > >> Ashwin.
> > >>
> > >> On Wed, Dec 23, 2015 at 3:07 PM, Ashwin Chandra Putta <
> > >> ashwinchandrap@gmail.com> wrote:
> > >>
> > >> > David,
> > >> >
> > >> > I can imagine that it boils down to something like these function
> > calls.
> > >> >
> > >> > DtString lines = readLines(new LineReader());
> > >> > DtString words = lines.split(new LineSplitter());
> > >> > DtNumber count  = words.count(new Counter());
> > >> > count.print(new ConoleOutputOperator());
> > >> >
> > >> > Or
> > >> >
> > >> > readLines(new LineReader())
> > >> >   .split(new LineSplitter())
> > >> >   .count(new Counter())
> > >> >   .print(new ConoleOutputOperator());
> > >> >
> > >> >
> > >> > which translates to
> > >> >
> > >> > Reader reader = dag.addOperator("ReadLines", new LineReader());
> > >> > LineSplitter splitter = dag.addOperator("Split", new
> LineSplitter());
> > >> > WordCounter counter = dag.addOperator("Count", new Counter());
> > >> > ConsoleOutputOperator console = dag.addOperator("Print", new
> > >> > ConsoleOutputOperator());
> > >> >
> > >> > dag.addStream("lines", reader.output, splitter.input);
> > >> > dag.addStream("words", splitter.output, counter.input);
> > >> > dag.addStream("count", counter.output, console.input);
> > >> >
> > >> > Here are my initial thoughts:
> > >> >
> > >> > For the higher level api to work, we need the following support at
> > >> least.
> > >> >
> > >> > 1. The operators used in the higher level api should have concrete
> > >> > implementations with all available input and output ports defined at
> > the
> > >> > abstract level. Have to think more about how multiple output ports
> > will
> > >> > play out.
> > >> > 2. We need to define the objects that have method calls available on
> > >> them
> > >> > that take operators as parameters.
> > >> >
> > >> > Eg: DtString can have method split and takes Splitter operator. And
> > >> > Splitter operator should be abstract with input port type DtString
> and
> > >> > output port type DtString. LineSplitter will be a concrete
> > >> implementation
> > >> > of this operator.
> > >> >
> > >> > Regards,
> > >> > Ashwin.
> > >> >
> > >> > On Wed, Dec 23, 2015 at 1:42 PM, David Yan <da...@datatorrent.com>
> > >> wrote:
> > >> >
> > >> >> Hi fellow Apex developers:
> > >> >>
> > >> >> Apex has a comprehensive API for constructing DAG topologies for
> > >> streaming
> > >> >> applications, using operators, ports and streams.  But this may
> seem
> > >> too
> > >> >> much for folks who just want to build simple applications, or just
> to
> > >> >> learn
> > >> >> about Apex.  For example, when you compare the code to do word
> count
> > in
> > >> >> Apex with Spark Streaming or Flink, Apex requires much more code.
> > >> >>
> > >> >> Apex:
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main
> > >> >>
> > >> >> Spark Streaming:
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java
> > >> >>
> > >> >> Flink:
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java
> > >> >>
> > >> >> Note that their Scala versions are even simpler to use.
> > >> >>
> > >> >> The high-level requirements I have in mind is as follow:
> > >> >>
> > >> >> 1. A simple-to-use high-level API similar to what Spark Streaming
> and
> > >> >> Flink
> > >> >> have. And from the high-level API, the Apex engine will construct
> the
> > >> >> actual DAG topology at launch time.
> > >> >>
> > >> >> 2. The first language we will support is Java, but we will also
> want
> > to
> > >> >> support Scala and possibly Python at some point, so the high-level
> > API
> > >> >> should make it easy for implementing bindings for at least these
> two
> > >> >> languages.
> > >> >>
> > >> >> 3. We should be able to use the high-level API in Apex App Package
> > >> (apa)
> > >> >> file, so that dtcli can launch it just like a regular apa today.
> > >> >>
> > >> >> Please provide your ideas and thoughts on this topic.
> > >> >>
> > >> >> Thanks,
> > >> >>
> > >> >> David
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Regards,
> > >> > Ashwin.
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> Regards,
> > >> Ashwin.
> > >>
> > >
> > >
> >
>

Re: High level API: Request for ideas

Posted by Chinmay Kolhatkar <ch...@datatorrent.com>.

+1 for not coming up with our own APIs.

But on the other hand the functional programming paradigm is something that
people are familiar with.
To find a middle spot we can have the DAG object to support a builder
pattern.
Specially for streaming application, the builder patterns can describe the
step by step process to user of Apex.

Basically as follows:


*dag.addOperation("ReadLines", new LineReader())*
*      .**addOperation**("Split", new LineSplitter());*
*      .**addOperation**("Count", new Counter())*
*      .**addOperation**("Print", new ConsoleOutputOperator());*

This will translate to following:









*Reader reader = dag.addOperator("ReadLines", new
LineReader());LineSplitter splitter = dag.addOperator("Split", new
LineSplitter());WordCounter counter = dag.addOperator("Count", new
Counter());ConsoleOutputOperator console = dag.addOperator("Print",
new ConsoleOutputOperator());dag.addStream("lines", reader.output,
splitter.input);dag.addStream("words", splitter.output,
counter.input);dag.addStream("count", counter.output, console.input);*

Thoughts?


~ Chinmay.

On Thu, Dec 24, 2015 at 11:28 AM, Sandeep Deshmukh <sa...@datatorrent.com>
wrote:

> +1 on not coming up with our own APIs. We should adapt to existing ones  so
> that there is additional learning curve for Apex users.
>
> It is more of a functional way of specifying the DAG and a subset of scala
> could be a good starting point.
>
> Regards,
> Sandeep
>
> On Thu, Dec 24, 2015 at 5:21 AM, Siyuan Hua <si...@datatorrent.com>
> wrote:
>
> > My first suggestion is we should focus on Stream API(or change the name
> we
> > call it) for now.  High-level API is confusion and could be anything that
> > helps.
> >
> > Stream is in fact more well-known concept other than Operators, ports,
> > connector, etc. I think the idea originate from scala sequence API(
> > http://www.scala-lang.org/api/current/#scala.collection.Seq)
> > And the term "Stream" already implies some minimal function we need ex.
> > "map"(t1->f->t1'), "reduce" (t1, t2,...  -> f -> t1'), "filter" (t1
> ->f(if
> > true) -> t1)
> > We shouldn't come up with arbitrary things so the API would become
> > cumbersome and hard to learn.
> >
> >
> >
> >
> >
> >
> > On Wed, Dec 23, 2015 at 3:12 PM, Siyuan Hua <si...@datatorrent.com>
> > wrote:
> >
> > > Another API that could be a reference is
> > > http://storm.apache.org/documentation/Trident-API-Overview.html
> > >
> > > On Wed, Dec 23, 2015 at 3:09 PM, Ashwin Chandra Putta <
> > > ashwinchandrap@gmail.com> wrote:
> > >
> > >> // Made few edits, ignore previous mail. Read this instead.
> > >>
> > >> David,
> > >>
> > >> I can imagine that it boils down to something like these function
> calls.
> > >>
> > >> DtString lines = readLines(new LineReader());
> > >> DtString words = lines.split(new LineSplitter());
> > >> DtNumber count  = words.count(new Counter());
> > >> count.print(new ConoleOutputOperator());
> > >>
> > >> Or
> > >>
> > >> readLines(new LineReader())
> > >>   .split(new LineSplitter())
> > >>   .count(new Counter())
> > >>   .print(new ConoleOutputOperator());
> > >>
> > >>
> > >> which translates to
> > >>
> > >> Reader reader = dag.addOperator("ReadLines", new LineReader());
> > >> Splitter splitter = dag.addOperator("Split", new LineSplitter());
> > >> Counter counter = dag.addOperator("Count", new WordCounter());
> > >> ConsoleOutputOperator console = dag.addOperator("Print", new
> > >> ConsoleOutputOperator());
> > >>
> > >> dag.addStream("lines", reader.output, splitter.input);
> > >> dag.addStream("words", splitter.output, counter.input);
> > >> dag.addStream("count", counter.output, console.input);
> > >>
> > >> Here are my initial thoughts:
> > >>
> > >> For the higher level api to work, we need the following support at
> > least.
> > >>
> > >> 1. The operators used in the higher level api should have concrete
> > >> implementations with all available input and output ports defined at
> the
> > >> abstract level. Have to think more about how multiple output ports
> will
> > >> play out.
> > >> 2. We need to define the objects that have method calls available on
> > them
> > >> that take operators as parameters.
> > >>
> > >> Eg: DtString can have method split and takes Splitter operator. And
> > >> Splitter operator should be abstract with input port type DtString and
> > >> output port type DtString. LineSplitter will be a concrete
> > implementation
> > >> of this operator.
> > >>
> > >> Regards,
> > >> Ashwin.
> > >>
> > >> On Wed, Dec 23, 2015 at 3:07 PM, Ashwin Chandra Putta <
> > >> ashwinchandrap@gmail.com> wrote:
> > >>
> > >> > David,
> > >> >
> > >> > I can imagine that it boils down to something like these function
> > calls.
> > >> >
> > >> > DtString lines = readLines(new LineReader());
> > >> > DtString words = lines.split(new LineSplitter());
> > >> > DtNumber count  = words.count(new Counter());
> > >> > count.print(new ConoleOutputOperator());
> > >> >
> > >> > Or
> > >> >
> > >> > readLines(new LineReader())
> > >> >   .split(new LineSplitter())
> > >> >   .count(new Counter())
> > >> >   .print(new ConoleOutputOperator());
> > >> >
> > >> >
> > >> > which translates to
> > >> >
> > >> > Reader reader = dag.addOperator("ReadLines", new LineReader());
> > >> > LineSplitter splitter = dag.addOperator("Split", new
> LineSplitter());
> > >> > WordCounter counter = dag.addOperator("Count", new Counter());
> > >> > ConsoleOutputOperator console = dag.addOperator("Print", new
> > >> > ConsoleOutputOperator());
> > >> >
> > >> > dag.addStream("lines", reader.output, splitter.input);
> > >> > dag.addStream("words", splitter.output, counter.input);
> > >> > dag.addStream("count", counter.output, console.input);
> > >> >
> > >> > Here are my initial thoughts:
> > >> >
> > >> > For the higher level api to work, we need the following support at
> > >> least.
> > >> >
> > >> > 1. The operators used in the higher level api should have concrete
> > >> > implementations with all available input and output ports defined at
> > the
> > >> > abstract level. Have to think more about how multiple output ports
> > will
> > >> > play out.
> > >> > 2. We need to define the objects that have method calls available on
> > >> them
> > >> > that take operators as parameters.
> > >> >
> > >> > Eg: DtString can have method split and takes Splitter operator. And
> > >> > Splitter operator should be abstract with input port type DtString
> and
> > >> > output port type DtString. LineSplitter will be a concrete
> > >> implementation
> > >> > of this operator.
> > >> >
> > >> > Regards,
> > >> > Ashwin.
> > >> >
> > >> > On Wed, Dec 23, 2015 at 1:42 PM, David Yan <da...@datatorrent.com>
> > >> wrote:
> > >> >
> > >> >> Hi fellow Apex developers:
> > >> >>
> > >> >> Apex has a comprehensive API for constructing DAG topologies for
> > >> streaming
> > >> >> applications, using operators, ports and streams.  But this may
> seem
> > >> too
> > >> >> much for folks who just want to build simple applications, or just
> to
> > >> >> learn
> > >> >> about Apex.  For example, when you compare the code to do word
> count
> > in
> > >> >> Apex with Spark Streaming or Flink, Apex requires much more code.
> > >> >>
> > >> >> Apex:
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main
> > >> >>
> > >> >> Spark Streaming:
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java
> > >> >>
> > >> >> Flink:
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java
> > >> >>
> > >> >> Note that their Scala versions are even simpler to use.
> > >> >>
> > >> >> The high-level requirements I have in mind is as follow:
> > >> >>
> > >> >> 1. A simple-to-use high-level API similar to what Spark Streaming
> and
> > >> >> Flink
> > >> >> have. And from the high-level API, the Apex engine will construct
> the
> > >> >> actual DAG topology at launch time.
> > >> >>
> > >> >> 2. The first language we will support is Java, but we will also
> want
> > to
> > >> >> support Scala and possibly Python at some point, so the high-level
> > API
> > >> >> should make it easy for implementing bindings for at least these
> two
> > >> >> languages.
> > >> >>
> > >> >> 3. We should be able to use the high-level API in Apex App Package
> > >> (apa)
> > >> >> file, so that dtcli can launch it just like a regular apa today.
> > >> >>
> > >> >> Please provide your ideas and thoughts on this topic.
> > >> >>
> > >> >> Thanks,
> > >> >>
> > >> >> David
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Regards,
> > >> > Ashwin.
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> Regards,
> > >> Ashwin.
> > >>
> > >
> > >
> >
>

Re: High level API: Request for ideas

Posted by Sandeep Deshmukh <sa...@datatorrent.com>.

+1 on not coming up with our own APIs. We should adapt to existing ones  so
that there is additional learning curve for Apex users.

It is more of a functional way of specifying the DAG and a subset of scala
could be a good starting point.

Regards,
Sandeep

On Thu, Dec 24, 2015 at 5:21 AM, Siyuan Hua <si...@datatorrent.com> wrote:

> My first suggestion is we should focus on Stream API(or change the name we
> call it) for now.  High-level API is confusion and could be anything that
> helps.
>
> Stream is in fact more well-known concept other than Operators, ports,
> connector, etc. I think the idea originate from scala sequence API(
> http://www.scala-lang.org/api/current/#scala.collection.Seq)
> And the term "Stream" already implies some minimal function we need ex.
> "map"(t1->f->t1'), "reduce" (t1, t2,...  -> f -> t1'), "filter" (t1 ->f(if
> true) -> t1)
> We shouldn't come up with arbitrary things so the API would become
> cumbersome and hard to learn.
>
>
>
>
>
>
> On Wed, Dec 23, 2015 at 3:12 PM, Siyuan Hua <si...@datatorrent.com>
> wrote:
>
> > Another API that could be a reference is
> > http://storm.apache.org/documentation/Trident-API-Overview.html
> >
> > On Wed, Dec 23, 2015 at 3:09 PM, Ashwin Chandra Putta <
> > ashwinchandrap@gmail.com> wrote:
> >
> >> // Made few edits, ignore previous mail. Read this instead.
> >>
> >> David,
> >>
> >> I can imagine that it boils down to something like these function calls.
> >>
> >> DtString lines = readLines(new LineReader());
> >> DtString words = lines.split(new LineSplitter());
> >> DtNumber count  = words.count(new Counter());
> >> count.print(new ConoleOutputOperator());
> >>
> >> Or
> >>
> >> readLines(new LineReader())
> >>   .split(new LineSplitter())
> >>   .count(new Counter())
> >>   .print(new ConoleOutputOperator());
> >>
> >>
> >> which translates to
> >>
> >> Reader reader = dag.addOperator("ReadLines", new LineReader());
> >> Splitter splitter = dag.addOperator("Split", new LineSplitter());
> >> Counter counter = dag.addOperator("Count", new WordCounter());
> >> ConsoleOutputOperator console = dag.addOperator("Print", new
> >> ConsoleOutputOperator());
> >>
> >> dag.addStream("lines", reader.output, splitter.input);
> >> dag.addStream("words", splitter.output, counter.input);
> >> dag.addStream("count", counter.output, console.input);
> >>
> >> Here are my initial thoughts:
> >>
> >> For the higher level api to work, we need the following support at
> least.
> >>
> >> 1. The operators used in the higher level api should have concrete
> >> implementations with all available input and output ports defined at the
> >> abstract level. Have to think more about how multiple output ports will
> >> play out.
> >> 2. We need to define the objects that have method calls available on
> them
> >> that take operators as parameters.
> >>
> >> Eg: DtString can have method split and takes Splitter operator. And
> >> Splitter operator should be abstract with input port type DtString and
> >> output port type DtString. LineSplitter will be a concrete
> implementation
> >> of this operator.
> >>
> >> Regards,
> >> Ashwin.
> >>
> >> On Wed, Dec 23, 2015 at 3:07 PM, Ashwin Chandra Putta <
> >> ashwinchandrap@gmail.com> wrote:
> >>
> >> > David,
> >> >
> >> > I can imagine that it boils down to something like these function
> calls.
> >> >
> >> > DtString lines = readLines(new LineReader());
> >> > DtString words = lines.split(new LineSplitter());
> >> > DtNumber count  = words.count(new Counter());
> >> > count.print(new ConoleOutputOperator());
> >> >
> >> > Or
> >> >
> >> > readLines(new LineReader())
> >> >   .split(new LineSplitter())
> >> >   .count(new Counter())
> >> >   .print(new ConoleOutputOperator());
> >> >
> >> >
> >> > which translates to
> >> >
> >> > Reader reader = dag.addOperator("ReadLines", new LineReader());
> >> > LineSplitter splitter = dag.addOperator("Split", new LineSplitter());
> >> > WordCounter counter = dag.addOperator("Count", new Counter());
> >> > ConsoleOutputOperator console = dag.addOperator("Print", new
> >> > ConsoleOutputOperator());
> >> >
> >> > dag.addStream("lines", reader.output, splitter.input);
> >> > dag.addStream("words", splitter.output, counter.input);
> >> > dag.addStream("count", counter.output, console.input);
> >> >
> >> > Here are my initial thoughts:
> >> >
> >> > For the higher level api to work, we need the following support at
> >> least.
> >> >
> >> > 1. The operators used in the higher level api should have concrete
> >> > implementations with all available input and output ports defined at
> the
> >> > abstract level. Have to think more about how multiple output ports
> will
> >> > play out.
> >> > 2. We need to define the objects that have method calls available on
> >> them
> >> > that take operators as parameters.
> >> >
> >> > Eg: DtString can have method split and takes Splitter operator. And
> >> > Splitter operator should be abstract with input port type DtString and
> >> > output port type DtString. LineSplitter will be a concrete
> >> implementation
> >> > of this operator.
> >> >
> >> > Regards,
> >> > Ashwin.
> >> >
> >> > On Wed, Dec 23, 2015 at 1:42 PM, David Yan <da...@datatorrent.com>
> >> wrote:
> >> >
> >> >> Hi fellow Apex developers:
> >> >>
> >> >> Apex has a comprehensive API for constructing DAG topologies for
> >> streaming
> >> >> applications, using operators, ports and streams.  But this may seem
> >> too
> >> >> much for folks who just want to build simple applications, or just to
> >> >> learn
> >> >> about Apex.  For example, when you compare the code to do word count
> in
> >> >> Apex with Spark Streaming or Flink, Apex requires much more code.
> >> >>
> >> >> Apex:
> >> >>
> >> >>
> >>
> https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main
> >> >>
> >> >> Spark Streaming:
> >> >>
> >> >>
> >>
> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java
> >> >>
> >> >> Flink:
> >> >>
> >> >>
> >>
> https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java
> >> >>
> >> >> Note that their Scala versions are even simpler to use.
> >> >>
> >> >> The high-level requirements I have in mind is as follow:
> >> >>
> >> >> 1. A simple-to-use high-level API similar to what Spark Streaming and
> >> >> Flink
> >> >> have. And from the high-level API, the Apex engine will construct the
> >> >> actual DAG topology at launch time.
> >> >>
> >> >> 2. The first language we will support is Java, but we will also want
> to
> >> >> support Scala and possibly Python at some point, so the high-level
> API
> >> >> should make it easy for implementing bindings for at least these two
> >> >> languages.
> >> >>
> >> >> 3. We should be able to use the high-level API in Apex App Package
> >> (apa)
> >> >> file, so that dtcli can launch it just like a regular apa today.
> >> >>
> >> >> Please provide your ideas and thoughts on this topic.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> David
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Regards,
> >> > Ashwin.
> >> >
> >>
> >>
> >>
> >> --
> >>
> >> Regards,
> >> Ashwin.
> >>
> >
> >
>

Re: High level API: Request for ideas

Posted by Siyuan Hua <si...@datatorrent.com>.

My first suggestion is we should focus on Stream API(or change the name we
call it) for now.  High-level API is confusion and could be anything that
helps.

Stream is in fact more well-known concept other than Operators, ports,
connector, etc. I think the idea originate from scala sequence API(
http://www.scala-lang.org/api/current/#scala.collection.Seq)
And the term "Stream" already implies some minimal function we need ex.
"map"(t1->f->t1'), "reduce" (t1, t2,...  -> f -> t1'), "filter" (t1 ->f(if
true) -> t1)
We shouldn't come up with arbitrary things so the API would become
cumbersome and hard to learn.






On Wed, Dec 23, 2015 at 3:12 PM, Siyuan Hua <si...@datatorrent.com> wrote:

> Another API that could be a reference is
> http://storm.apache.org/documentation/Trident-API-Overview.html
>
> On Wed, Dec 23, 2015 at 3:09 PM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>
>> // Made few edits, ignore previous mail. Read this instead.
>>
>> David,
>>
>> I can imagine that it boils down to something like these function calls.
>>
>> DtString lines = readLines(new LineReader());
>> DtString words = lines.split(new LineSplitter());
>> DtNumber count  = words.count(new Counter());
>> count.print(new ConoleOutputOperator());
>>
>> Or
>>
>> readLines(new LineReader())
>>   .split(new LineSplitter())
>>   .count(new Counter())
>>   .print(new ConoleOutputOperator());
>>
>>
>> which translates to
>>
>> Reader reader = dag.addOperator("ReadLines", new LineReader());
>> Splitter splitter = dag.addOperator("Split", new LineSplitter());
>> Counter counter = dag.addOperator("Count", new WordCounter());
>> ConsoleOutputOperator console = dag.addOperator("Print", new
>> ConsoleOutputOperator());
>>
>> dag.addStream("lines", reader.output, splitter.input);
>> dag.addStream("words", splitter.output, counter.input);
>> dag.addStream("count", counter.output, console.input);
>>
>> Here are my initial thoughts:
>>
>> For the higher level api to work, we need the following support at least.
>>
>> 1. The operators used in the higher level api should have concrete
>> implementations with all available input and output ports defined at the
>> abstract level. Have to think more about how multiple output ports will
>> play out.
>> 2. We need to define the objects that have method calls available on them
>> that take operators as parameters.
>>
>> Eg: DtString can have method split and takes Splitter operator. And
>> Splitter operator should be abstract with input port type DtString and
>> output port type DtString. LineSplitter will be a concrete implementation
>> of this operator.
>>
>> Regards,
>> Ashwin.
>>
>> On Wed, Dec 23, 2015 at 3:07 PM, Ashwin Chandra Putta <
>> ashwinchandrap@gmail.com> wrote:
>>
>> > David,
>> >
>> > I can imagine that it boils down to something like these function calls.
>> >
>> > DtString lines = readLines(new LineReader());
>> > DtString words = lines.split(new LineSplitter());
>> > DtNumber count  = words.count(new Counter());
>> > count.print(new ConoleOutputOperator());
>> >
>> > Or
>> >
>> > readLines(new LineReader())
>> >   .split(new LineSplitter())
>> >   .count(new Counter())
>> >   .print(new ConoleOutputOperator());
>> >
>> >
>> > which translates to
>> >
>> > Reader reader = dag.addOperator("ReadLines", new LineReader());
>> > LineSplitter splitter = dag.addOperator("Split", new LineSplitter());
>> > WordCounter counter = dag.addOperator("Count", new Counter());
>> > ConsoleOutputOperator console = dag.addOperator("Print", new
>> > ConsoleOutputOperator());
>> >
>> > dag.addStream("lines", reader.output, splitter.input);
>> > dag.addStream("words", splitter.output, counter.input);
>> > dag.addStream("count", counter.output, console.input);
>> >
>> > Here are my initial thoughts:
>> >
>> > For the higher level api to work, we need the following support at
>> least.
>> >
>> > 1. The operators used in the higher level api should have concrete
>> > implementations with all available input and output ports defined at the
>> > abstract level. Have to think more about how multiple output ports will
>> > play out.
>> > 2. We need to define the objects that have method calls available on
>> them
>> > that take operators as parameters.
>> >
>> > Eg: DtString can have method split and takes Splitter operator. And
>> > Splitter operator should be abstract with input port type DtString and
>> > output port type DtString. LineSplitter will be a concrete
>> implementation
>> > of this operator.
>> >
>> > Regards,
>> > Ashwin.
>> >
>> > On Wed, Dec 23, 2015 at 1:42 PM, David Yan <da...@datatorrent.com>
>> wrote:
>> >
>> >> Hi fellow Apex developers:
>> >>
>> >> Apex has a comprehensive API for constructing DAG topologies for
>> streaming
>> >> applications, using operators, ports and streams.  But this may seem
>> too
>> >> much for folks who just want to build simple applications, or just to
>> >> learn
>> >> about Apex.  For example, when you compare the code to do word count in
>> >> Apex with Spark Streaming or Flink, Apex requires much more code.
>> >>
>> >> Apex:
>> >>
>> >>
>> https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main
>> >>
>> >> Spark Streaming:
>> >>
>> >>
>> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java
>> >>
>> >> Flink:
>> >>
>> >>
>> https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java
>> >>
>> >> Note that their Scala versions are even simpler to use.
>> >>
>> >> The high-level requirements I have in mind is as follow:
>> >>
>> >> 1. A simple-to-use high-level API similar to what Spark Streaming and
>> >> Flink
>> >> have. And from the high-level API, the Apex engine will construct the
>> >> actual DAG topology at launch time.
>> >>
>> >> 2. The first language we will support is Java, but we will also want to
>> >> support Scala and possibly Python at some point, so the high-level API
>> >> should make it easy for implementing bindings for at least these two
>> >> languages.
>> >>
>> >> 3. We should be able to use the high-level API in Apex App Package
>> (apa)
>> >> file, so that dtcli can launch it just like a regular apa today.
>> >>
>> >> Please provide your ideas and thoughts on this topic.
>> >>
>> >> Thanks,
>> >>
>> >> David
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > Regards,
>> > Ashwin.
>> >
>>
>>
>>
>> --
>>
>> Regards,
>> Ashwin.
>>
>
>

Re: High level API: Request for ideas

Posted by Siyuan Hua <si...@datatorrent.com>.

Another API that could be a reference is
http://storm.apache.org/documentation/Trident-API-Overview.html

On Wed, Dec 23, 2015 at 3:09 PM, Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> // Made few edits, ignore previous mail. Read this instead.
>
> David,
>
> I can imagine that it boils down to something like these function calls.
>
> DtString lines = readLines(new LineReader());
> DtString words = lines.split(new LineSplitter());
> DtNumber count  = words.count(new Counter());
> count.print(new ConoleOutputOperator());
>
> Or
>
> readLines(new LineReader())
>   .split(new LineSplitter())
>   .count(new Counter())
>   .print(new ConoleOutputOperator());
>
>
> which translates to
>
> Reader reader = dag.addOperator("ReadLines", new LineReader());
> Splitter splitter = dag.addOperator("Split", new LineSplitter());
> Counter counter = dag.addOperator("Count", new WordCounter());
> ConsoleOutputOperator console = dag.addOperator("Print", new
> ConsoleOutputOperator());
>
> dag.addStream("lines", reader.output, splitter.input);
> dag.addStream("words", splitter.output, counter.input);
> dag.addStream("count", counter.output, console.input);
>
> Here are my initial thoughts:
>
> For the higher level api to work, we need the following support at least.
>
> 1. The operators used in the higher level api should have concrete
> implementations with all available input and output ports defined at the
> abstract level. Have to think more about how multiple output ports will
> play out.
> 2. We need to define the objects that have method calls available on them
> that take operators as parameters.
>
> Eg: DtString can have method split and takes Splitter operator. And
> Splitter operator should be abstract with input port type DtString and
> output port type DtString. LineSplitter will be a concrete implementation
> of this operator.
>
> Regards,
> Ashwin.
>
> On Wed, Dec 23, 2015 at 3:07 PM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>
> > David,
> >
> > I can imagine that it boils down to something like these function calls.
> >
> > DtString lines = readLines(new LineReader());
> > DtString words = lines.split(new LineSplitter());
> > DtNumber count  = words.count(new Counter());
> > count.print(new ConoleOutputOperator());
> >
> > Or
> >
> > readLines(new LineReader())
> >   .split(new LineSplitter())
> >   .count(new Counter())
> >   .print(new ConoleOutputOperator());
> >
> >
> > which translates to
> >
> > Reader reader = dag.addOperator("ReadLines", new LineReader());
> > LineSplitter splitter = dag.addOperator("Split", new LineSplitter());
> > WordCounter counter = dag.addOperator("Count", new Counter());
> > ConsoleOutputOperator console = dag.addOperator("Print", new
> > ConsoleOutputOperator());
> >
> > dag.addStream("lines", reader.output, splitter.input);
> > dag.addStream("words", splitter.output, counter.input);
> > dag.addStream("count", counter.output, console.input);
> >
> > Here are my initial thoughts:
> >
> > For the higher level api to work, we need the following support at least.
> >
> > 1. The operators used in the higher level api should have concrete
> > implementations with all available input and output ports defined at the
> > abstract level. Have to think more about how multiple output ports will
> > play out.
> > 2. We need to define the objects that have method calls available on them
> > that take operators as parameters.
> >
> > Eg: DtString can have method split and takes Splitter operator. And
> > Splitter operator should be abstract with input port type DtString and
> > output port type DtString. LineSplitter will be a concrete implementation
> > of this operator.
> >
> > Regards,
> > Ashwin.
> >
> > On Wed, Dec 23, 2015 at 1:42 PM, David Yan <da...@datatorrent.com>
> wrote:
> >
> >> Hi fellow Apex developers:
> >>
> >> Apex has a comprehensive API for constructing DAG topologies for
> streaming
> >> applications, using operators, ports and streams.  But this may seem too
> >> much for folks who just want to build simple applications, or just to
> >> learn
> >> about Apex.  For example, when you compare the code to do word count in
> >> Apex with Spark Streaming or Flink, Apex requires much more code.
> >>
> >> Apex:
> >>
> >>
> https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main
> >>
> >> Spark Streaming:
> >>
> >>
> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java
> >>
> >> Flink:
> >>
> >>
> https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java
> >>
> >> Note that their Scala versions are even simpler to use.
> >>
> >> The high-level requirements I have in mind is as follow:
> >>
> >> 1. A simple-to-use high-level API similar to what Spark Streaming and
> >> Flink
> >> have. And from the high-level API, the Apex engine will construct the
> >> actual DAG topology at launch time.
> >>
> >> 2. The first language we will support is Java, but we will also want to
> >> support Scala and possibly Python at some point, so the high-level API
> >> should make it easy for implementing bindings for at least these two
> >> languages.
> >>
> >> 3. We should be able to use the high-level API in Apex App Package (apa)
> >> file, so that dtcli can launch it just like a regular apa today.
> >>
> >> Please provide your ideas and thoughts on this topic.
> >>
> >> Thanks,
> >>
> >> David
> >>
> >
> >
> >
> > --
> >
> > Regards,
> > Ashwin.
> >
>
>
>
> --
>
> Regards,
> Ashwin.
>

Re: High level API: Request for ideas

Posted by Ashwin Chandra Putta <as...@gmail.com>.

// Made few edits, ignore previous mail. Read this instead.

David,

I can imagine that it boils down to something like these function calls.

DtString lines = readLines(new LineReader());
DtString words = lines.split(new LineSplitter());
DtNumber count  = words.count(new Counter());
count.print(new ConoleOutputOperator());

Or

readLines(new LineReader())
  .split(new LineSplitter())
  .count(new Counter())
  .print(new ConoleOutputOperator());


which translates to

Reader reader = dag.addOperator("ReadLines", new LineReader());
Splitter splitter = dag.addOperator("Split", new LineSplitter());
Counter counter = dag.addOperator("Count", new WordCounter());
ConsoleOutputOperator console = dag.addOperator("Print", new
ConsoleOutputOperator());

dag.addStream("lines", reader.output, splitter.input);
dag.addStream("words", splitter.output, counter.input);
dag.addStream("count", counter.output, console.input);

Here are my initial thoughts:

For the higher level api to work, we need the following support at least.

1. The operators used in the higher level api should have concrete
implementations with all available input and output ports defined at the
abstract level. Have to think more about how multiple output ports will
play out.
2. We need to define the objects that have method calls available on them
that take operators as parameters.

Eg: DtString can have method split and takes Splitter operator. And
Splitter operator should be abstract with input port type DtString and
output port type DtString. LineSplitter will be a concrete implementation
of this operator.

Regards,
Ashwin.

On Wed, Dec 23, 2015 at 3:07 PM, Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> David,
>
> I can imagine that it boils down to something like these function calls.
>
> DtString lines = readLines(new LineReader());
> DtString words = lines.split(new LineSplitter());
> DtNumber count  = words.count(new Counter());
> count.print(new ConoleOutputOperator());
>
> Or
>
> readLines(new LineReader())
>   .split(new LineSplitter())
>   .count(new Counter())
>   .print(new ConoleOutputOperator());
>
>
> which translates to
>
> Reader reader = dag.addOperator("ReadLines", new LineReader());
> LineSplitter splitter = dag.addOperator("Split", new LineSplitter());
> WordCounter counter = dag.addOperator("Count", new Counter());
> ConsoleOutputOperator console = dag.addOperator("Print", new
> ConsoleOutputOperator());
>
> dag.addStream("lines", reader.output, splitter.input);
> dag.addStream("words", splitter.output, counter.input);
> dag.addStream("count", counter.output, console.input);
>
> Here are my initial thoughts:
>
> For the higher level api to work, we need the following support at least.
>
> 1. The operators used in the higher level api should have concrete
> implementations with all available input and output ports defined at the
> abstract level. Have to think more about how multiple output ports will
> play out.
> 2. We need to define the objects that have method calls available on them
> that take operators as parameters.
>
> Eg: DtString can have method split and takes Splitter operator. And
> Splitter operator should be abstract with input port type DtString and
> output port type DtString. LineSplitter will be a concrete implementation
> of this operator.
>
> Regards,
> Ashwin.
>
> On Wed, Dec 23, 2015 at 1:42 PM, David Yan <da...@datatorrent.com> wrote:
>
>> Hi fellow Apex developers:
>>
>> Apex has a comprehensive API for constructing DAG topologies for streaming
>> applications, using operators, ports and streams.  But this may seem too
>> much for folks who just want to build simple applications, or just to
>> learn
>> about Apex.  For example, when you compare the code to do word count in
>> Apex with Spark Streaming or Flink, Apex requires much more code.
>>
>> Apex:
>>
>> https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main
>>
>> Spark Streaming:
>>
>> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java
>>
>> Flink:
>>
>> https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java
>>
>> Note that their Scala versions are even simpler to use.
>>
>> The high-level requirements I have in mind is as follow:
>>
>> 1. A simple-to-use high-level API similar to what Spark Streaming and
>> Flink
>> have. And from the high-level API, the Apex engine will construct the
>> actual DAG topology at launch time.
>>
>> 2. The first language we will support is Java, but we will also want to
>> support Scala and possibly Python at some point, so the high-level API
>> should make it easy for implementing bindings for at least these two
>> languages.
>>
>> 3. We should be able to use the high-level API in Apex App Package (apa)
>> file, so that dtcli can launch it just like a regular apa today.
>>
>> Please provide your ideas and thoughts on this topic.
>>
>> Thanks,
>>
>> David
>>
>
>
>
> --
>
> Regards,
> Ashwin.
>



-- 

Regards,
Ashwin.

Re: High level API: Request for ideas

Posted by Ashwin Chandra Putta <as...@gmail.com>.

David,

I can imagine that it boils down to something like these function calls.

DtString lines = readLines(new LineReader());
DtString words = lines.split(new LineSplitter());
DtNumber count  = words.count(new Counter());
count.print(new ConoleOutputOperator());

Or

readLines(new LineReader())
  .split(new LineSplitter())
  .count(new Counter())
  .print(new ConoleOutputOperator());


which translates to

Reader reader = dag.addOperator("ReadLines", new LineReader());
LineSplitter splitter = dag.addOperator("Split", new LineSplitter());
WordCounter counter = dag.addOperator("Count", new Counter());
ConsoleOutputOperator console = dag.addOperator("Print", new
ConsoleOutputOperator());

dag.addStream("lines", reader.output, splitter.input);
dag.addStream("words", splitter.output, counter.input);
dag.addStream("count", counter.output, console.input);

Here are my initial thoughts:

For the higher level api to work, we need the following support at least.

1. The operators used in the higher level api should have concrete
implementations with all available input and output ports defined at the
abstract level. Have to think more about how multiple output ports will
play out.
2. We need to define the objects that have method calls available on them
that take operators as parameters.

Eg: DtString can have method split and takes Splitter operator. And
Splitter operator should be abstract with input port type DtString and
output port type DtString. LineSplitter will be a concrete implementation
of this operator.

Regards,
Ashwin.

On Wed, Dec 23, 2015 at 1:42 PM, David Yan <da...@datatorrent.com> wrote:

> Hi fellow Apex developers:
>
> Apex has a comprehensive API for constructing DAG topologies for streaming
> applications, using operators, ports and streams.  But this may seem too
> much for folks who just want to build simple applications, or just to learn
> about Apex.  For example, when you compare the code to do word count in
> Apex with Spark Streaming or Flink, Apex requires much more code.
>
> Apex:
>
> https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main
>
> Spark Streaming:
>
> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java
>
> Flink:
>
> https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java
>
> Note that their Scala versions are even simpler to use.
>
> The high-level requirements I have in mind is as follow:
>
> 1. A simple-to-use high-level API similar to what Spark Streaming and Flink
> have. And from the high-level API, the Apex engine will construct the
> actual DAG topology at launch time.
>
> 2. The first language we will support is Java, but we will also want to
> support Scala and possibly Python at some point, so the high-level API
> should make it easy for implementing bindings for at least these two
> languages.
>
> 3. We should be able to use the high-level API in Apex App Package (apa)
> file, so that dtcli can launch it just like a regular apa today.
>
> Please provide your ideas and thoughts on this topic.
>
> Thanks,
>
> David
>



-- 

Regards,
Ashwin.

Re: High level API: Request for ideas

Posted by Munagala Ramanath <ra...@datatorrent.com>.

Here is a slightly different take on the high level API question:

There is a common notation used to textually specify graphs called DOT. It
is
described here:
https://en.wikipedia.org/wiki/DOT_(graph_description_language)
and a specification is available here:
http://www.graphviz.org/content/dot-language

It is used by a variety of graph visualization tools but most notably by
GraphViz: http://www.graphviz.org/. It is also used by Pig in their testing
framework, e.g.
http://www.programcreek.com/java-api-examples/index.php?api=org.apache.pig.test.utils.dotGraph.parser.DOTParser

We can use this notation for DAG specification; doing that for the archetype
generated application results in the body of the *populateDAG()* method
looking
like this:
--------------------------------------------
*    final String s =*
*      "digraph G {\n" +*
*      "randGen [class=\"com.example.myapexapp.RandomNumberGenerator\"];\n"
+*
*      "console
[class=\"com.datatorrent.lib.io.ConsoleOutputOperator\"];\n" +*
*      "randGen -> console [id=randomData, src=out, tgt=input]\n" +*
*      "}";*

*    StringBuffer sb = new StringBuffer(s);*
*    Builder.build(sb, dag);*
--------------------------------------------

For more complex DAGs, the DOT string defining the DAG will obviously get
longer but the
code remains the same -- just 2 lines. The benefits here are that the DAG
specification is
decoupled from the mechanics of language bindings and we would be using a
notation
that is already widely used for graph specification.

A sample proof-of-concept implementation is available at:
https://github.com/amberarrow/samples
The file *hilevel/README.md* describes how to build the application.

Ram

On Wed, Dec 23, 2015 at 1:42 PM, David Yan <da...@datatorrent.com> wrote:

> Hi fellow Apex developers:
>
> Apex has a comprehensive API for constructing DAG topologies for streaming
> applications, using operators, ports and streams.  But this may seem too
> much for folks who just want to build simple applications, or just to learn
> about Apex.  For example, when you compare the code to do word count in
> Apex with Spark Streaming or Flink, Apex requires much more code.
>
> Apex:
>
> https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main
>
> Spark Streaming:
>
> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java
>
> Flink:
>
> https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java
>
> Note that their Scala versions are even simpler to use.
>
> The high-level requirements I have in mind is as follow:
>
> 1. A simple-to-use high-level API similar to what Spark Streaming and Flink
> have. And from the high-level API, the Apex engine will construct the
> actual DAG topology at launch time.
>
> 2. The first language we will support is Java, but we will also want to
> support Scala and possibly Python at some point, so the high-level API
> should make it easy for implementing bindings for at least these two
> languages.
>
> 3. We should be able to use the high-level API in Apex App Package (apa)
> file, so that dtcli can launch it just like a regular apa today.
>
> Please provide your ideas and thoughts on this topic.
>
> Thanks,
>
> David
>