You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@apex.apache.org by Amit Shah <am...@gmail.com> on 2016/01/25 11:40:57 UTC

What-if analysis with apex

Hello,

I am trying to evaluate apache apex for building an application that
supports what-if analysis support to users. This co-relates closed with
excel kind of functionality where changing a value in one cell triggers
changes in other cell values. In our case we would have multiple rows in
various tables getting updated when the user changes a row value. The
response needs to be in real-time or near real-time.

Does Apex fit such an use-case? If so, what would be some of initial steps
to evaluate it for this use case?

Thanks!

Re: What-if analysis with apex

Posted by Pramod Immaneni <pr...@datatorrent.com>.

Amit,

You mentioned you have oracle as the data store. In the past apex has been
used in conjunction with golden gate which has the ability to track table
and row changes in oracle and notify external entities.

The changes are published by golden gate to JMS and you can have a JMS
operator in an apex application to receive those changes. Then you can and
add other operators to the JMS operator to perform your business logic.

Thanks

On Jan 25, 2016, at 3:26 AM, Amit Shah <am...@gmail.com> wrote:

Thanks Bhupesh for the reply.

Answering your questions first

what is the data store that you are using

We have Oracle as our data store. With the consideration of a big data
processing platform like apache apex, I think we would have to consider
looking for some distributed in-memory data stores like memsql

how is the user change captured

User changes would be captured through an UI interface.

 how large is the rule set which you are considering for your use case?

We expect the user input to affect a bunch of tables (~ 10-20).

Following your suggestion on DAG, I think I would have to expand the graph
to store row level dependencies
for.e.g. table A, row X, column index 5 depends on table B, row M, column
index 10 & table C, row P, column index 7 along with the formula that is
used to re-calculate the values. So when user changes the value of table B,
row M, column index 10, the value for table A, row X, column index 5 would
be re-calculated based on the new value.

If I store this graph somewhere (?) I wouldn't have to poll the table for
changes, right? Once the user changes a particular table, I can query the
graph and determine the cell values that need to be re-calculated. What do
you think?
If this maps to the approach you mentioned, does apache apex help in
calculations?

Thanks,
Amit.

On Mon, Jan 25, 2016 at 4:36 PM, Bhupesh Chawda <bh...@gmail.com>
wrote:

> Hi Amit,
>
> From what I understand, you can do something like the following:
> You can create a DAG like:
>
> O1 -> O2
>
> where O1 is an input operator while O2 is an output operator.
>
> O1 listens for changes in values in a table / multiple tables. I assume
> that the user change will result in change of some data in one of the
> tables. Listening for a change may include monitoring a set of tables for
> any change. This may be achieved by continuously / intermittently querying
> the table for any change. In case this becomes very IO intensive, you can
> use something like database triggers which update a meta data table with
> the change that has happened.
>
> Once the change is detected, O1 sends the change to O2 which according to
> a set of rules defined in it, can update the corresponding target stores.
> The set of rules that you define is analogous to the excel macros that are
> defined.
>
> It would help if you also mention what is the data store that you are
> using and how is the user change captured. Also, how large is the rule set
> which you are considering for your use case?
>
> -Bhupesh
>
>
> On Mon, Jan 25, 2016 at 4:10 PM, Amit Shah <am...@gmail.com> wrote:
>
>> Hello,
>>
>> I am trying to evaluate apache apex for building an application that
>> supports what-if analysis support to users. This co-relates closed with
>> excel kind of functionality where changing a value in one cell triggers
>> changes in other cell values. In our case we would have multiple rows in
>> various tables getting updated when the user changes a row value. The
>> response needs to be in real-time or near real-time.
>>
>> Does Apex fit such an use-case? If so, what would be some of initial
>> steps to evaluate it for this use case?
>>
>> Thanks!
>>
>
>
>
> --
> Regards,
> Bhupesh Chawda
>

Re: What-if analysis with apex

Posted by Bhupesh Chawda <bh...@gmail.com>.

Hi Amit,

Thanks for the details.

The polling I was referring to was to identify the user changes. Since you
mention that these changes will be from a UI, I think you can directly send
these changes to an operator. Rather the operator can listen to the changes
that happen through the UI.

Also, by the rule set, I mean the set of rules (change to Table X,Row Y,
Col Z implies change to Table D, Row E, Col F). I assume this is large
enough not to fit in memory. If this is the case, then we can have it in
another store which can be queried by an operator.

Here is the approach using Apex:

O1 -> O2 -> O3

   - O1 listens to UI changes and propagates them in form of a triple
   (X,Y,Z) to O2. Here X -> Table, Y -> Row, Z -> Column.
   - O2 receives the triple from O1 and queries a data store to identify
   the set of target changes.
   - Assume that the flattened rule graph (X, Y, Z) -> [(D, E, F, C)] is
      stored in a data store which can be queried. (D, E, F) corresponds to the
      Table, Row and Column, while C refers to the recalculation formula.
      - Once O2 receives the changes that need to be done [(D, E, F)], it
   can forward them to O3 in form of triples (D1, E1, F1), (D2, E2, F2), etc.
   O3 then takes in these records and makes the required changes to the
   appropriate target tables.

Thanks
-Bhupesh

On Mon, Jan 25, 2016 at 4:56 PM, Amit Shah <am...@gmail.com> wrote:

> Thanks Bhupesh for the reply.
>
> Answering your questions first
>
> what is the data store that you are using
>
>
> We have Oracle as our data store. With the consideration of a big data
> processing platform like apache apex, I think we would have to consider
> looking for some distributed in-memory data stores like memsql
>
> how is the user change captured
>
>
> User changes would be captured through an UI interface.
>
>  how large is the rule set which you are considering for your use case?
>
>
> We expect the user input to affect a bunch of tables (~ 10-20).
>
> Following your suggestion on DAG, I think I would have to expand the graph
> to store row level dependencies
> for.e.g. table A, row X, column index 5 depends on table B, row M, column
> index 10 & table C, row P, column index 7 along with the formula that is
> used to re-calculate the values. So when user changes the value of table B,
> row M, column index 10, the value for table A, row X, column index 5 would
> be re-calculated based on the new value.
>
> If I store this graph somewhere (?) I wouldn't have to poll the table for
> changes, right? Once the user changes a particular table, I can query the
> graph and determine the cell values that need to be re-calculated. What do
> you think?
> If this maps to the approach you mentioned, does apache apex help in
> calculations?
>
>
> Thanks,
> Amit.
>
>
> On Mon, Jan 25, 2016 at 4:36 PM, Bhupesh Chawda <bh...@gmail.com>
> wrote:
>
>> Hi Amit,
>>
>> From what I understand, you can do something like the following:
>> You can create a DAG like:
>>
>> O1 -> O2
>>
>> where O1 is an input operator while O2 is an output operator.
>>
>> O1 listens for changes in values in a table / multiple tables. I assume
>> that the user change will result in change of some data in one of the
>> tables. Listening for a change may include monitoring a set of tables for
>> any change. This may be achieved by continuously / intermittently querying
>> the table for any change. In case this becomes very IO intensive, you can
>> use something like database triggers which update a meta data table with
>> the change that has happened.
>>
>> Once the change is detected, O1 sends the change to O2 which according to
>> a set of rules defined in it, can update the corresponding target stores.
>> The set of rules that you define is analogous to the excel macros that are
>> defined.
>>
>> It would help if you also mention what is the data store that you are
>> using and how is the user change captured. Also, how large is the rule set
>> which you are considering for your use case?
>>
>> -Bhupesh
>>
>>
>> On Mon, Jan 25, 2016 at 4:10 PM, Amit Shah <am...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I am trying to evaluate apache apex for building an application that
>>> supports what-if analysis support to users. This co-relates closed with
>>> excel kind of functionality where changing a value in one cell triggers
>>> changes in other cell values. In our case we would have multiple rows in
>>> various tables getting updated when the user changes a row value. The
>>> response needs to be in real-time or near real-time.
>>>
>>> Does Apex fit such an use-case? If so, what would be some of initial
>>> steps to evaluate it for this use case?
>>>
>>> Thanks!
>>>
>>
>>
>>
>> --
>> Regards,
>> Bhupesh Chawda
>>
>
>


-- 
Regards,
Bhupesh Chawda

Re: What-if analysis with apex

Posted by Amit Shah <am...@gmail.com>.

Thanks Bhupesh for the reply.

Answering your questions first

what is the data store that you are using

We have Oracle as our data store. With the consideration of a big data
processing platform like apache apex, I think we would have to consider
looking for some distributed in-memory data stores like memsql

how is the user change captured

User changes would be captured through an UI interface.

 how large is the rule set which you are considering for your use case?

We expect the user input to affect a bunch of tables (~ 10-20).

Following your suggestion on DAG, I think I would have to expand the graph
to store row level dependencies
for.e.g. table A, row X, column index 5 depends on table B, row M, column
index 10 & table C, row P, column index 7 along with the formula that is
used to re-calculate the values. So when user changes the value of table B,
row M, column index 10, the value for table A, row X, column index 5 would
be re-calculated based on the new value.

If I store this graph somewhere (?) I wouldn't have to poll the table for
changes, right? Once the user changes a particular table, I can query the
graph and determine the cell values that need to be re-calculated. What do
you think?
If this maps to the approach you mentioned, does apache apex help in
calculations?

Thanks,
Amit.

On Mon, Jan 25, 2016 at 4:36 PM, Bhupesh Chawda <bh...@gmail.com>
wrote:

> Hi Amit,
>
> From what I understand, you can do something like the following:
> You can create a DAG like:
>
> O1 -> O2
>
> where O1 is an input operator while O2 is an output operator.
>
> O1 listens for changes in values in a table / multiple tables. I assume
> that the user change will result in change of some data in one of the
> tables. Listening for a change may include monitoring a set of tables for
> any change. This may be achieved by continuously / intermittently querying
> the table for any change. In case this becomes very IO intensive, you can
> use something like database triggers which update a meta data table with
> the change that has happened.
>
> Once the change is detected, O1 sends the change to O2 which according to
> a set of rules defined in it, can update the corresponding target stores.
> The set of rules that you define is analogous to the excel macros that are
> defined.
>
> It would help if you also mention what is the data store that you are
> using and how is the user change captured. Also, how large is the rule set
> which you are considering for your use case?
>
> -Bhupesh
>
>
> On Mon, Jan 25, 2016 at 4:10 PM, Amit Shah <am...@gmail.com> wrote:
>
>> Hello,
>>
>> I am trying to evaluate apache apex for building an application that
>> supports what-if analysis support to users. This co-relates closed with
>> excel kind of functionality where changing a value in one cell triggers
>> changes in other cell values. In our case we would have multiple rows in
>> various tables getting updated when the user changes a row value. The
>> response needs to be in real-time or near real-time.
>>
>> Does Apex fit such an use-case? If so, what would be some of initial
>> steps to evaluate it for this use case?
>>
>> Thanks!
>>
>
>
>
> --
> Regards,
> Bhupesh Chawda
>

Re: What-if analysis with apex

Posted by Bhupesh Chawda <bh...@gmail.com>.

Hi Amit,

>From what I understand, you can do something like the following:
You can create a DAG like:

O1 -> O2

where O1 is an input operator while O2 is an output operator.

O1 listens for changes in values in a table / multiple tables. I assume
that the user change will result in change of some data in one of the
tables. Listening for a change may include monitoring a set of tables for
any change. This may be achieved by continuously / intermittently querying
the table for any change. In case this becomes very IO intensive, you can
use something like database triggers which update a meta data table with
the change that has happened.

Once the change is detected, O1 sends the change to O2 which according to a
set of rules defined in it, can update the corresponding target stores. The
set of rules that you define is analogous to the excel macros that are
defined.

It would help if you also mention what is the data store that you are using
and how is the user change captured. Also, how large is the rule set which
you are considering for your use case?

-Bhupesh

On Mon, Jan 25, 2016 at 4:10 PM, Amit Shah <am...@gmail.com> wrote:

> Hello,
>
> I am trying to evaluate apache apex for building an application that
> supports what-if analysis support to users. This co-relates closed with
> excel kind of functionality where changing a value in one cell triggers
> changes in other cell values. In our case we would have multiple rows in
> various tables getting updated when the user changes a row value. The
> response needs to be in real-time or near real-time.
>
> Does Apex fit such an use-case? If so, what would be some of initial
> steps to evaluate it for this use case?
>
> Thanks!
>

-- 
Regards,
Bhupesh Chawda

Re: What-if analysis with apex

Posted by Amol Kekre <am...@datatorrent.com>.

Amit,
Apriori knowledge of depth is extremaly valuable to figure out how to
partition. That is the core problem here, along with fault tolerance.

The DAG can be considered a collection of functional units with data
flowing as per connectivity. The two top level ways to partition are
1. Hardcode depth in the DAG: Assign specific tables/cells to a functional
code, and scale by restricting particular tables/cells to each functional
code and connecting them properly. This heavily depends on small depth. In
effect you are hardcoding the depth. Whether you do this statically or
dynamically the scale issue remains the same.
2. Key less Load balance: Assign overall graphs to a functional code and
scale by having lots of functional code. This scales very well as long as
the compute fits in one JVM, which it will unless your graph has millions
of node. Even then there are ways to use this approach and scale with load
balancing.

Ashwin has proposed something similar to #2 which scales very well if the
depth is not say millions of cells in each compute. This design pattern has
been proven and scales very well (almost linearly). #1 will need work to
assign keys properly and rely on data flow. The final upload (bulk upload
is more prefered) is similar.

At a logical level this problem looks as follows.
Definitions
Key = K(table, row, column)
Input = set of keys (aka various cells in different tables or even tables
in diff dbs)
Output = Updates to various Keys (aka cells in differnt tables or even
tables in diff dbs) after applying basic excel like functionality

In your particular example
Input is {(t1, r1, c1), (t2, r2, c2), (t4,r4,4)}
Output is {(t3,r3,c3),(t5,r5,c5),(t6,r6,c6)} = Excel(Inputs) // Not even
though (t3,r3,c3) is intermediate input, all that matters is that it exist
as part of db upload list

To scale you need the following to happen
a. Architecture not having bottlenecks when lots of cells are updated. For
example a bulk upload on a table on the input side may trigger lots of
computations
b. No change in architecture as you scale, ideally you simply want to add
nodes to the cluster
c. SLA being met
d. Conflict resolution/Concurrency -> Same cell being upgraded at the same
time
e. Not have a circular dependency

Apex will solve a,b,c for you natively in the platform. It will handle
scale, fault tolerance, quick time to market, operability etc. Apache
Malhar has operators that can get you started. You can even leverage
salient but critical issues like graceful degradation, no data loss, spool
to HDFS for later audits etc. Apex handles operational aspects for you and
which leaves you to only deal with business logic. As you correctly
guessed, single thread execution will help you not to worry about
multi-threading issues. If you use #2, in Apex you can also upload new
function expressions on the fly without having to take the app down, as the
Excel function can be expressed in a file that can be read as soon as it is
changed. If you choose #1, then the way out is dynamic redo of DAG, which
may get very messy.

d & e are part of business logic. d is slighly easier with #1, but it is
essentially a business logic issue. d is affected by data flow
latency/order etc. that business logic needs to resolve. #2 can also handle
d by segregating bulk upload based on keys, and resolving conflicts before
upload. Either case, it is a business logic issue, what to do when same
cell is written to by two actions within a given time perrod is a business
decision. e is a business logic issue that is simply not solvable by any
code/platform. There are algorithms that detect circular dependancy and
flag the error, which you can spool to HDFS fo later audit and not worry
about the app hanging up.

Thks
Amol

On Thu, Jan 28, 2016 at 8:30 AM, Amit Shah <am...@gmail.com> wrote:

> Amol, these graphs could be 3-4 levels deep.
> What and how would this impact?
>
> Thanks,
> Amit.
>
> On Thu, Jan 28, 2016 at 9:56 PM, Amol Kekre <am...@datatorrent.com> wrote:
>
>>
>> Amit,
>> How deep are these graphs?
>>
>> Thks
>> Amol
>>
>>
>> On Thu, Jan 28, 2016 at 6:49 AM, Amit Shah <am...@gmail.com> wrote:
>>
>>> Thanks Sandeep for the follow up. I have tried responding to your
>>> queries. Kindly let me know if that gives you an idea on what I am trying
>>> to achieve
>>>
>>> how you will be representing your dependencies in a graph
>>>
>>>
>>> Attached a sample dependency graph. I was assuming each cell to be
>>> represented as an operator in apex terms so that they could be executed in
>>> parallel
>>>
>>> How many such dependency graphs will be there?
>>>
>>>
>>> Total number of graphs would be approximately equal to the number of
>>> rows that could be modified by the user (considering the worst case). The
>>> number should be in 1000's.
>>>
>>> Do you have one graph per change of cell defining its dependent cells? So,
>>>> for the example you mentioned, do you define it as O1 dependent cells into
>>>> one graph? Then there is another graph which defines what values are
>>>> updated if some other cell O7 is updated.
>>>
>>>
>>> Yes approximately one graph per cell. The dependency graph I have tried
>>> presenting in the attached diagram could be executed if any of the cell
>>> values in table 1, 2 or 4 are updated. For simplicity I have picked up
>>> cells from distinct tables.
>>>
>>> In my view, once the user sees the tables on the UI, we could create the
>>> dependency graphs in the background. Once he/she updates a cell value, our
>>> application would figure out its corresponding dependency graph and start
>>> its execution by
>>> 1. Loading values for unmodified cells
>>> 2. Determine the cells (or operators) that are to be recalculated. For
>>> e.g. if the cell with identifier as table1, row 1, column 1 is updated, the
>>> application would determine that 2 cell values are to be updated.
>>> 3. Execute the cells in parallel (if possible)
>>> 4. Render the updated values in real time to the user.
>>>
>>> Thanks,
>>> Amit.
>>>
>>> On Thu, Jan 28, 2016 at 7:28 PM, Sandeep Deshmukh <
>>> sandeep@datatorrent.com> wrote:
>>>
>>>> Hi Amit,
>>>>
>>>> Your concern is that change of one cell is going to trigger update for
>>>> large number of cells and you are interested in doing this in parallel to
>>>> get real-time response. This can be very well achieved using Apex.
>>>>
>>>> I think we are still not very clear on your use case and hence what we
>>>> have proposed may not fit match what you are looking for.
>>>>
>>>> We would like to know how you will be representing your dependencies in
>>>> a graph. How many such dependency graphs will be there? Do you have one
>>>> graph per change of cell defining its dependent cells? So, for the example
>>>> you mentioned, do you define it as O1 dependent cells into one graph? Then
>>>> there is another graph which defines what values are updated if some other
>>>> cell O7 is updated.
>>>>
>>>> Once we fully understand your requirements, we should be able to guide
>>>> you better.
>>>>
>>>>
>>>> Regards,
>>>> Sandeep
>>>>
>>>> On Thu, Jan 28, 2016 at 2:56 PM, Amit Shah <am...@gmail.com> wrote:
>>>>
>>>>> Ashwin, Below are follow up queries that I have based on your response.
>>>>>
>>>>> The store I mentioned is just an abstraction. It can be in memory
>>>>>> store, or a cache backed lookup from a database.
>>>>>
>>>>>
>>>>> Yes I understand by the term store but I didn't follow the need of it.
>>>>>
>>>>>
>>>>> How does your UI interact with your server today?
>>>>>
>>>>>
>>>>> Our UI is built over angularjs so it communicates with the server
>>>>> through REST api's.
>>>>>
>>>>> You dont have to create a new DAG for each cell you are changing. You
>>>>>> can have a single DAG running and send across your query with the cell
>>>>>> changes in the schema you define. You can perform all corresponding changes
>>>>>> for other cells/table rows in the store operator.
>>>>>
>>>>>
>>>>> I was under the impression that by defining one operator per column
>>>>> index I could take the advantage of apex running individual operators on
>>>>> individual jvm's and hence parallel writes in real-time or near real-time
>>>>> response time. If we have single static DAG that accepts the cell
>>>>> identiifer (row Id, column index and table id) as parameters then we would
>>>>> not be able to concurrently updates cell values right?
>>>>> If your understanding is different from the flow I explained in my
>>>>> previous mail, what do I gain by using apex?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Amit.
>>>>>
>>>>>
>>>>> On Thu, Jan 28, 2016 at 12:51 AM, Ashwin Chandra Putta <
>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>
>>>>>> Amit,
>>>>>>
>>>>>> The store I mentioned is just an abstraction. It can be in memory
>>>>>> store, or a cache backed lookup from a database.
>>>>>>
>>>>>> For the query/query response, when interacting with a UI - you can
>>>>>> send your queries to the query operator and listen for response from the
>>>>>> query response operator. Historically we have used json over websockets to
>>>>>> interact from browser. How does your UI interact with your server today?
>>>>>>
>>>>>> You dont have to create a new DAG for each cell you are changing. You
>>>>>> can have a single DAG running and send across your query with the cell
>>>>>> changes in the schema you define. You can perform all corresponding changes
>>>>>> for other cells/table rows in the store operator.
>>>>>>
>>>>>> If you still want to depend completely on your existing server for
>>>>>> loading initial data, then you can load it to a cache in store and do your
>>>>>> analysis on that data in memory.
>>>>>>
>>>>>> Regards,
>>>>>> Ashwin.
>>>>>>
>>>>>> On Wed, Jan 27, 2016 at 7:42 AM, Amol Kekre <am...@datatorrent.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Amit,
>>>>>>> Here are some answers
>>>>>>> - Logic that you want to run can be coded as an utility, that is
>>>>>>> then invoked by any other operator
>>>>>>> - PopulateDAG() is today part of roll out of the app, i.e it is
>>>>>>> similar to "compileTime" and not "runTime". You could do runTime, but then
>>>>>>> you will need to go through dtcli. Today runTime changes via dtcli will
>>>>>>> need a lot more coding. A very early version of runTime changes (based on
>>>>>>> system metrics) exist, but the ask is for changes based on application
>>>>>>> data. That ask is in the roadmap of module rollout (phase II?) and others
>>>>>>> can comment on the roadmap for runtTime populateDAG.
>>>>>>> - Outputs of many operators can be streamed as input to one operator
>>>>>>> in following ways
>>>>>>>    - Each output having different schema will mean different input
>>>>>>> ports on that operator as port schema is fixed. This is fine, but will
>>>>>>> clutter the DAG
>>>>>>>    - If the schema of these output ports is same, there is a merge
>>>>>>> operator that does that (
>>>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java).
>>>>>>> You can write one for Nx1 merge by extending the above class.
>>>>>>>
>>>>>>> Thks,
>>>>>>> Amol
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 27, 2016 at 6:03 AM, Amit Shah <am...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Ashwin for the follow up.
>>>>>>>> I am not sure if I completely follow the query -> store -> query
>>>>>>>> pattern. What does query mean here? Why would we need a in-memory store?
>>>>>>>> Trying to list down the flow I came up with below points
>>>>>>>>
>>>>>>>>    1. We need to build a DAG after we get to know the cell (table,
>>>>>>>>    row and column index) that is modified by the user.
>>>>>>>>    2. Once we receive user input (i.e. once the user modifies a
>>>>>>>>    value in a table) the populateDAG() method should be called.
>>>>>>>>    3. The populateDAG() implementation would
>>>>>>>>    1. Determine what cells should be updated across all tables
>>>>>>>>       2. Create an Operator per cell that is affected by the
>>>>>>>>       change. From the demo code I see dag.addOperator method
>>>>>>>>       instantiating an operator. Since the logic to update an cell
>>>>>>>>       would be the same across tables how do we create new operators per cell to
>>>>>>>>       have a graph that looks what Bhupesh envisioned in his last email reply? In
>>>>>>>>       my view the graph would like
>>>>>>>>
>>>>>>>>                     O1 (for user modified cell) -> O2 (table X, row
>>>>>>>> Y, column index 2) -> O5 (table E, row F, column index 10000)
>>>>>>>>                                                                O3
>>>>>>>> (table M, row N, column index 3)
>>>>>>>>                       ->  O6 (update UI)
>>>>>>>>                                                                O4
>>>>>>>> (table P, row Q, column index 1)
>>>>>>>>
>>>>>>>>               3. We want the DAG to be evaluated instantly once
>>>>>>>> the populateDAG() method finishes. How do we do it?
>>>>>>>>               4. Can outputs from many operators be streamed as an
>>>>>>>> input to one operator? From the above example outputs from O3, O4
>>>>>>>> and O5 need to go to O6.
>>>>>>>>
>>>>>>>> I appreciate your inputs on this.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Amit.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 27, 2016 at 1:49 PM, Ashwin Chandra Putta <
>>>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Amit,
>>>>>>>>>
>>>>>>>>> Thanks for the response. You can use the query --> store --> query
>>>>>>>>> result pattern to do the real time updates and lookups for what-if analysis.
>>>>>>>>>
>>>>>>>>> And you can also ingest your real time input data to the store
>>>>>>>>> operator. input --> store.
>>>>>>>>>
>>>>>>>>> That way, you can keep ingesting your data into the store operator
>>>>>>>>> where you will keep your OLAP dimensions and measures.
>>>>>>>>>
>>>>>>>>> For the query/query result pattern example, see this demo:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ashwin.
>>>>>>>>>
>>>>>>>>> On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <am...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Appreciate the discussion we are having on this topic.
>>>>>>>>>>
>>>>>>>>>> Bhupesh, If I understand the flow correctly, we would have to
>>>>>>>>>> define one DAG per cell in the table that could be modified by the user.
>>>>>>>>>> Given this, it would be right to define the DAG only when the table is
>>>>>>>>>> presented to the user on the UI (not at definition time since there would
>>>>>>>>>> be many tables). Would it be possible to define DAG at runtime i.e.
>>>>>>>>>> defining & wiring the operators at runtime?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ashwin, I am glad to answer these questions
>>>>>>>>>>
>>>>>>>>>> 1. We are extending our OLTP based application by introducing
>>>>>>>>>> analytical features that includes what-if kind of analysis. Other features
>>>>>>>>>> do include performing OLAP kind of operations like aggregation, slice &
>>>>>>>>>> dice, drill down/up, pivoting. Our first milestone is to target what-if
>>>>>>>>>> kind of analysis. We don't have any implementation so far. We are exploring
>>>>>>>>>> out solutions to these requirements
>>>>>>>>>> 2. The technical challenges we have include having an in-memory
>>>>>>>>>> calculation engine system that supports parallel writes and provides real
>>>>>>>>>> time or near real time response time.
>>>>>>>>>>
>>>>>>>>>> Hope that answers your queries.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Amit.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
>>>>>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Amit,
>>>>>>>>>>>
>>>>>>>>>>> I have a couple of questions if its not much.
>>>>>>>>>>>
>>>>>>>>>>> 1. What is the current implementation?
>>>>>>>>>>> 2. What are the challenges you are facing?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Ashwin.
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I am trying to evaluate apache apex for building an application
>>>>>>>>>>> that supports what-if analysis support to users. This co-relates closed
>>>>>>>>>>> with excel kind of functionality where changing a value in one cell
>>>>>>>>>>> triggers changes in other cell values. In our case we would have multiple
>>>>>>>>>>> rows in various tables getting updated when the user changes a row value.
>>>>>>>>>>> The response needs to be in real-time or near real-time.
>>>>>>>>>>>
>>>>>>>>>>> Does Apex fit such an use-case? If so, what would be some of
>>>>>>>>>>> initial steps to evaluate it for this use case?
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ashwin.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Regards,
>>>>>> Ashwin.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: What-if analysis with apex

Posted by Amit Shah <am...@gmail.com>.

Amol, these graphs could be 3-4 levels deep.
What and how would this impact?

Thanks,
Amit.

On Thu, Jan 28, 2016 at 9:56 PM, Amol Kekre <am...@datatorrent.com> wrote:

>
> Amit,
> How deep are these graphs?
>
> Thks
> Amol
>
>
> On Thu, Jan 28, 2016 at 6:49 AM, Amit Shah <am...@gmail.com> wrote:
>
>> Thanks Sandeep for the follow up. I have tried responding to your
>> queries. Kindly let me know if that gives you an idea on what I am trying
>> to achieve
>>
>> how you will be representing your dependencies in a graph
>>
>>
>> Attached a sample dependency graph. I was assuming each cell to be
>> represented as an operator in apex terms so that they could be executed in
>> parallel
>>
>> How many such dependency graphs will be there?
>>
>>
>> Total number of graphs would be approximately equal to the number of rows
>> that could be modified by the user (considering the worst case). The number
>> should be in 1000's.
>>
>> Do you have one graph per change of cell defining its dependent cells? So,
>>> for the example you mentioned, do you define it as O1 dependent cells into
>>> one graph? Then there is another graph which defines what values are
>>> updated if some other cell O7 is updated.
>>
>>
>> Yes approximately one graph per cell. The dependency graph I have tried
>> presenting in the attached diagram could be executed if any of the cell
>> values in table 1, 2 or 4 are updated. For simplicity I have picked up
>> cells from distinct tables.
>>
>> In my view, once the user sees the tables on the UI, we could create the
>> dependency graphs in the background. Once he/she updates a cell value, our
>> application would figure out its corresponding dependency graph and start
>> its execution by
>> 1. Loading values for unmodified cells
>> 2. Determine the cells (or operators) that are to be recalculated. For
>> e.g. if the cell with identifier as table1, row 1, column 1 is updated, the
>> application would determine that 2 cell values are to be updated.
>> 3. Execute the cells in parallel (if possible)
>> 4. Render the updated values in real time to the user.
>>
>> Thanks,
>> Amit.
>>
>> On Thu, Jan 28, 2016 at 7:28 PM, Sandeep Deshmukh <
>> sandeep@datatorrent.com> wrote:
>>
>>> Hi Amit,
>>>
>>> Your concern is that change of one cell is going to trigger update for
>>> large number of cells and you are interested in doing this in parallel to
>>> get real-time response. This can be very well achieved using Apex.
>>>
>>> I think we are still not very clear on your use case and hence what we
>>> have proposed may not fit match what you are looking for.
>>>
>>> We would like to know how you will be representing your dependencies in
>>> a graph. How many such dependency graphs will be there? Do you have one
>>> graph per change of cell defining its dependent cells? So, for the example
>>> you mentioned, do you define it as O1 dependent cells into one graph? Then
>>> there is another graph which defines what values are updated if some other
>>> cell O7 is updated.
>>>
>>> Once we fully understand your requirements, we should be able to guide
>>> you better.
>>>
>>>
>>> Regards,
>>> Sandeep
>>>
>>> On Thu, Jan 28, 2016 at 2:56 PM, Amit Shah <am...@gmail.com> wrote:
>>>
>>>> Ashwin, Below are follow up queries that I have based on your response.
>>>>
>>>> The store I mentioned is just an abstraction. It can be in memory
>>>>> store, or a cache backed lookup from a database.
>>>>
>>>>
>>>> Yes I understand by the term store but I didn't follow the need of it.
>>>>
>>>> How does your UI interact with your server today?
>>>>
>>>>
>>>> Our UI is built over angularjs so it communicates with the server
>>>> through REST api's.
>>>>
>>>> You dont have to create a new DAG for each cell you are changing. You
>>>>> can have a single DAG running and send across your query with the cell
>>>>> changes in the schema you define. You can perform all corresponding changes
>>>>> for other cells/table rows in the store operator.
>>>>
>>>>
>>>> I was under the impression that by defining one operator per column
>>>> index I could take the advantage of apex running individual operators on
>>>> individual jvm's and hence parallel writes in real-time or near real-time
>>>> response time. If we have single static DAG that accepts the cell
>>>> identiifer (row Id, column index and table id) as parameters then we would
>>>> not be able to concurrently updates cell values right?
>>>> If your understanding is different from the flow I explained in my
>>>> previous mail, what do I gain by using apex?
>>>>
>>>>
>>>> Thanks,
>>>> Amit.
>>>>
>>>>
>>>> On Thu, Jan 28, 2016 at 12:51 AM, Ashwin Chandra Putta <
>>>> ashwinchandrap@gmail.com> wrote:
>>>>
>>>>> Amit,
>>>>>
>>>>> The store I mentioned is just an abstraction. It can be in memory
>>>>> store, or a cache backed lookup from a database.
>>>>>
>>>>> For the query/query response, when interacting with a UI - you can
>>>>> send your queries to the query operator and listen for response from the
>>>>> query response operator. Historically we have used json over websockets to
>>>>> interact from browser. How does your UI interact with your server today?
>>>>>
>>>>> You dont have to create a new DAG for each cell you are changing. You
>>>>> can have a single DAG running and send across your query with the cell
>>>>> changes in the schema you define. You can perform all corresponding changes
>>>>> for other cells/table rows in the store operator.
>>>>>
>>>>> If you still want to depend completely on your existing server for
>>>>> loading initial data, then you can load it to a cache in store and do your
>>>>> analysis on that data in memory.
>>>>>
>>>>> Regards,
>>>>> Ashwin.
>>>>>
>>>>> On Wed, Jan 27, 2016 at 7:42 AM, Amol Kekre <am...@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Amit,
>>>>>> Here are some answers
>>>>>> - Logic that you want to run can be coded as an utility, that is then
>>>>>> invoked by any other operator
>>>>>> - PopulateDAG() is today part of roll out of the app, i.e it is
>>>>>> similar to "compileTime" and not "runTime". You could do runTime, but then
>>>>>> you will need to go through dtcli. Today runTime changes via dtcli will
>>>>>> need a lot more coding. A very early version of runTime changes (based on
>>>>>> system metrics) exist, but the ask is for changes based on application
>>>>>> data. That ask is in the roadmap of module rollout (phase II?) and others
>>>>>> can comment on the roadmap for runtTime populateDAG.
>>>>>> - Outputs of many operators can be streamed as input to one operator
>>>>>> in following ways
>>>>>>    - Each output having different schema will mean different input
>>>>>> ports on that operator as port schema is fixed. This is fine, but will
>>>>>> clutter the DAG
>>>>>>    - If the schema of these output ports is same, there is a merge
>>>>>> operator that does that (
>>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java).
>>>>>> You can write one for Nx1 merge by extending the above class.
>>>>>>
>>>>>> Thks,
>>>>>> Amol
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 27, 2016 at 6:03 AM, Amit Shah <am...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Ashwin for the follow up.
>>>>>>> I am not sure if I completely follow the query -> store -> query
>>>>>>> pattern. What does query mean here? Why would we need a in-memory store?
>>>>>>> Trying to list down the flow I came up with below points
>>>>>>>
>>>>>>>    1. We need to build a DAG after we get to know the cell (table,
>>>>>>>    row and column index) that is modified by the user.
>>>>>>>    2. Once we receive user input (i.e. once the user modifies a
>>>>>>>    value in a table) the populateDAG() method should be called.
>>>>>>>    3. The populateDAG() implementation would
>>>>>>>    1. Determine what cells should be updated across all tables
>>>>>>>       2. Create an Operator per cell that is affected by the
>>>>>>>       change. From the demo code I see dag.addOperator method
>>>>>>>       instantiating an operator. Since the logic to update an cell
>>>>>>>       would be the same across tables how do we create new operators per cell to
>>>>>>>       have a graph that looks what Bhupesh envisioned in his last email reply? In
>>>>>>>       my view the graph would like
>>>>>>>
>>>>>>>                     O1 (for user modified cell) -> O2 (table X, row
>>>>>>> Y, column index 2) -> O5 (table E, row F, column index 10000)
>>>>>>>                                                                O3
>>>>>>> (table M, row N, column index 3)
>>>>>>>                       ->  O6 (update UI)
>>>>>>>                                                                O4
>>>>>>> (table P, row Q, column index 1)
>>>>>>>
>>>>>>>               3. We want the DAG to be evaluated instantly once the
>>>>>>> populateDAG() method finishes. How do we do it?
>>>>>>>               4. Can outputs from many operators be streamed as an
>>>>>>> input to one operator? From the above example outputs from O3, O4
>>>>>>> and O5 need to go to O6.
>>>>>>>
>>>>>>> I appreciate your inputs on this.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Amit.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 27, 2016 at 1:49 PM, Ashwin Chandra Putta <
>>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>>
>>>>>>>> Amit,
>>>>>>>>
>>>>>>>> Thanks for the response. You can use the query --> store --> query
>>>>>>>> result pattern to do the real time updates and lookups for what-if analysis.
>>>>>>>>
>>>>>>>> And you can also ingest your real time input data to the store
>>>>>>>> operator. input --> store.
>>>>>>>>
>>>>>>>> That way, you can keep ingesting your data into the store operator
>>>>>>>> where you will keep your OLAP dimensions and measures.
>>>>>>>>
>>>>>>>> For the query/query result pattern example, see this demo:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ashwin.
>>>>>>>>
>>>>>>>> On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <am...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Appreciate the discussion we are having on this topic.
>>>>>>>>>
>>>>>>>>> Bhupesh, If I understand the flow correctly, we would have to
>>>>>>>>> define one DAG per cell in the table that could be modified by the user.
>>>>>>>>> Given this, it would be right to define the DAG only when the table is
>>>>>>>>> presented to the user on the UI (not at definition time since there would
>>>>>>>>> be many tables). Would it be possible to define DAG at runtime i.e.
>>>>>>>>> defining & wiring the operators at runtime?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ashwin, I am glad to answer these questions
>>>>>>>>>
>>>>>>>>> 1. We are extending our OLTP based application by introducing
>>>>>>>>> analytical features that includes what-if kind of analysis. Other features
>>>>>>>>> do include performing OLAP kind of operations like aggregation, slice &
>>>>>>>>> dice, drill down/up, pivoting. Our first milestone is to target what-if
>>>>>>>>> kind of analysis. We don't have any implementation so far. We are exploring
>>>>>>>>> out solutions to these requirements
>>>>>>>>> 2. The technical challenges we have include having an in-memory
>>>>>>>>> calculation engine system that supports parallel writes and provides real
>>>>>>>>> time or near real time response time.
>>>>>>>>>
>>>>>>>>> Hope that answers your queries.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Amit.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
>>>>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Amit,
>>>>>>>>>>
>>>>>>>>>> I have a couple of questions if its not much.
>>>>>>>>>>
>>>>>>>>>> 1. What is the current implementation?
>>>>>>>>>> 2. What are the challenges you are facing?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ashwin.
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I am trying to evaluate apache apex for building an application
>>>>>>>>>> that supports what-if analysis support to users. This co-relates closed
>>>>>>>>>> with excel kind of functionality where changing a value in one cell
>>>>>>>>>> triggers changes in other cell values. In our case we would have multiple
>>>>>>>>>> rows in various tables getting updated when the user changes a row value.
>>>>>>>>>> The response needs to be in real-time or near real-time.
>>>>>>>>>>
>>>>>>>>>> Does Apex fit such an use-case? If so, what would be some of
>>>>>>>>>> initial steps to evaluate it for this use case?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ashwin.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Regards,
>>>>> Ashwin.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: What-if analysis with apex

Posted by Amol Kekre <am...@datatorrent.com>.

Amit,
How deep are these graphs?

Thks
Amol


On Thu, Jan 28, 2016 at 6:49 AM, Amit Shah <am...@gmail.com> wrote:

> Thanks Sandeep for the follow up. I have tried responding to your queries.
> Kindly let me know if that gives you an idea on what I am trying to achieve
>
> how you will be representing your dependencies in a graph
>
>
> Attached a sample dependency graph. I was assuming each cell to be
> represented as an operator in apex terms so that they could be executed in
> parallel
>
> How many such dependency graphs will be there?
>
>
> Total number of graphs would be approximately equal to the number of rows
> that could be modified by the user (considering the worst case). The number
> should be in 1000's.
>
> Do you have one graph per change of cell defining its dependent cells? So,
>> for the example you mentioned, do you define it as O1 dependent cells into
>> one graph? Then there is another graph which defines what values are
>> updated if some other cell O7 is updated.
>
>
> Yes approximately one graph per cell. The dependency graph I have tried
> presenting in the attached diagram could be executed if any of the cell
> values in table 1, 2 or 4 are updated. For simplicity I have picked up
> cells from distinct tables.
>
> In my view, once the user sees the tables on the UI, we could create the
> dependency graphs in the background. Once he/she updates a cell value, our
> application would figure out its corresponding dependency graph and start
> its execution by
> 1. Loading values for unmodified cells
> 2. Determine the cells (or operators) that are to be recalculated. For
> e.g. if the cell with identifier as table1, row 1, column 1 is updated, the
> application would determine that 2 cell values are to be updated.
> 3. Execute the cells in parallel (if possible)
> 4. Render the updated values in real time to the user.
>
> Thanks,
> Amit.
>
> On Thu, Jan 28, 2016 at 7:28 PM, Sandeep Deshmukh <sandeep@datatorrent.com
> > wrote:
>
>> Hi Amit,
>>
>> Your concern is that change of one cell is going to trigger update for
>> large number of cells and you are interested in doing this in parallel to
>> get real-time response. This can be very well achieved using Apex.
>>
>> I think we are still not very clear on your use case and hence what we
>> have proposed may not fit match what you are looking for.
>>
>> We would like to know how you will be representing your dependencies in a
>> graph. How many such dependency graphs will be there? Do you have one graph
>> per change of cell defining its dependent cells? So, for the example you
>> mentioned, do you define it as O1 dependent cells into one graph? Then
>> there is another graph which defines what values are updated if some other
>> cell O7 is updated.
>>
>> Once we fully understand your requirements, we should be able to guide
>> you better.
>>
>>
>> Regards,
>> Sandeep
>>
>> On Thu, Jan 28, 2016 at 2:56 PM, Amit Shah <am...@gmail.com> wrote:
>>
>>> Ashwin, Below are follow up queries that I have based on your response.
>>>
>>> The store I mentioned is just an abstraction. It can be in memory store,
>>>> or a cache backed lookup from a database.
>>>
>>>
>>> Yes I understand by the term store but I didn't follow the need of it.
>>>
>>> How does your UI interact with your server today?
>>>
>>>
>>> Our UI is built over angularjs so it communicates with the server
>>> through REST api's.
>>>
>>> You dont have to create a new DAG for each cell you are changing. You
>>>> can have a single DAG running and send across your query with the cell
>>>> changes in the schema you define. You can perform all corresponding changes
>>>> for other cells/table rows in the store operator.
>>>
>>>
>>> I was under the impression that by defining one operator per column
>>> index I could take the advantage of apex running individual operators on
>>> individual jvm's and hence parallel writes in real-time or near real-time
>>> response time. If we have single static DAG that accepts the cell
>>> identiifer (row Id, column index and table id) as parameters then we would
>>> not be able to concurrently updates cell values right?
>>> If your understanding is different from the flow I explained in my
>>> previous mail, what do I gain by using apex?
>>>
>>>
>>> Thanks,
>>> Amit.
>>>
>>>
>>> On Thu, Jan 28, 2016 at 12:51 AM, Ashwin Chandra Putta <
>>> ashwinchandrap@gmail.com> wrote:
>>>
>>>> Amit,
>>>>
>>>> The store I mentioned is just an abstraction. It can be in memory
>>>> store, or a cache backed lookup from a database.
>>>>
>>>> For the query/query response, when interacting with a UI - you can send
>>>> your queries to the query operator and listen for response from the query
>>>> response operator. Historically we have used json over websockets to
>>>> interact from browser. How does your UI interact with your server today?
>>>>
>>>> You dont have to create a new DAG for each cell you are changing. You
>>>> can have a single DAG running and send across your query with the cell
>>>> changes in the schema you define. You can perform all corresponding changes
>>>> for other cells/table rows in the store operator.
>>>>
>>>> If you still want to depend completely on your existing server for
>>>> loading initial data, then you can load it to a cache in store and do your
>>>> analysis on that data in memory.
>>>>
>>>> Regards,
>>>> Ashwin.
>>>>
>>>> On Wed, Jan 27, 2016 at 7:42 AM, Amol Kekre <am...@datatorrent.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Amit,
>>>>> Here are some answers
>>>>> - Logic that you want to run can be coded as an utility, that is then
>>>>> invoked by any other operator
>>>>> - PopulateDAG() is today part of roll out of the app, i.e it is
>>>>> similar to "compileTime" and not "runTime". You could do runTime, but then
>>>>> you will need to go through dtcli. Today runTime changes via dtcli will
>>>>> need a lot more coding. A very early version of runTime changes (based on
>>>>> system metrics) exist, but the ask is for changes based on application
>>>>> data. That ask is in the roadmap of module rollout (phase II?) and others
>>>>> can comment on the roadmap for runtTime populateDAG.
>>>>> - Outputs of many operators can be streamed as input to one operator
>>>>> in following ways
>>>>>    - Each output having different schema will mean different input
>>>>> ports on that operator as port schema is fixed. This is fine, but will
>>>>> clutter the DAG
>>>>>    - If the schema of these output ports is same, there is a merge
>>>>> operator that does that (
>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java).
>>>>> You can write one for Nx1 merge by extending the above class.
>>>>>
>>>>> Thks,
>>>>> Amol
>>>>>
>>>>>
>>>>> On Wed, Jan 27, 2016 at 6:03 AM, Amit Shah <am...@gmail.com> wrote:
>>>>>
>>>>>> Thanks Ashwin for the follow up.
>>>>>> I am not sure if I completely follow the query -> store -> query
>>>>>> pattern. What does query mean here? Why would we need a in-memory store?
>>>>>> Trying to list down the flow I came up with below points
>>>>>>
>>>>>>    1. We need to build a DAG after we get to know the cell (table,
>>>>>>    row and column index) that is modified by the user.
>>>>>>    2. Once we receive user input (i.e. once the user modifies a
>>>>>>    value in a table) the populateDAG() method should be called.
>>>>>>    3. The populateDAG() implementation would
>>>>>>    1. Determine what cells should be updated across all tables
>>>>>>       2. Create an Operator per cell that is affected by the change.
>>>>>>       From the demo code I see dag.addOperator method instantiating
>>>>>>       an operator. Since the logic to update an cell would be the
>>>>>>       same across tables how do we create new operators per cell to have a graph
>>>>>>       that looks what Bhupesh envisioned in his last email reply? In
>>>>>>       my view the graph would like
>>>>>>
>>>>>>                     O1 (for user modified cell) -> O2 (table X, row
>>>>>> Y, column index 2) -> O5 (table E, row F, column index 10000)
>>>>>>                                                                O3
>>>>>> (table M, row N, column index 3)
>>>>>>                       ->  O6 (update UI)
>>>>>>                                                                O4
>>>>>> (table P, row Q, column index 1)
>>>>>>
>>>>>>               3. We want the DAG to be evaluated instantly once the
>>>>>> populateDAG() method finishes. How do we do it?
>>>>>>               4. Can outputs from many operators be streamed as an
>>>>>> input to one operator? From the above example outputs from O3, O4
>>>>>> and O5 need to go to O6.
>>>>>>
>>>>>> I appreciate your inputs on this.
>>>>>>
>>>>>> Thanks,
>>>>>> Amit.
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 27, 2016 at 1:49 PM, Ashwin Chandra Putta <
>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>
>>>>>>> Amit,
>>>>>>>
>>>>>>> Thanks for the response. You can use the query --> store --> query
>>>>>>> result pattern to do the real time updates and lookups for what-if analysis.
>>>>>>>
>>>>>>> And you can also ingest your real time input data to the store
>>>>>>> operator. input --> store.
>>>>>>>
>>>>>>> That way, you can keep ingesting your data into the store operator
>>>>>>> where you will keep your OLAP dimensions and measures.
>>>>>>>
>>>>>>> For the query/query result pattern example, see this demo:
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ashwin.
>>>>>>>
>>>>>>> On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <am...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Appreciate the discussion we are having on this topic.
>>>>>>>>
>>>>>>>> Bhupesh, If I understand the flow correctly, we would have to
>>>>>>>> define one DAG per cell in the table that could be modified by the user.
>>>>>>>> Given this, it would be right to define the DAG only when the table is
>>>>>>>> presented to the user on the UI (not at definition time since there would
>>>>>>>> be many tables). Would it be possible to define DAG at runtime i.e.
>>>>>>>> defining & wiring the operators at runtime?
>>>>>>>>
>>>>>>>>
>>>>>>>> Ashwin, I am glad to answer these questions
>>>>>>>>
>>>>>>>> 1. We are extending our OLTP based application by introducing
>>>>>>>> analytical features that includes what-if kind of analysis. Other features
>>>>>>>> do include performing OLAP kind of operations like aggregation, slice &
>>>>>>>> dice, drill down/up, pivoting. Our first milestone is to target what-if
>>>>>>>> kind of analysis. We don't have any implementation so far. We are exploring
>>>>>>>> out solutions to these requirements
>>>>>>>> 2. The technical challenges we have include having an in-memory
>>>>>>>> calculation engine system that supports parallel writes and provides real
>>>>>>>> time or near real time response time.
>>>>>>>>
>>>>>>>> Hope that answers your queries.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Amit.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
>>>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Amit,
>>>>>>>>>
>>>>>>>>> I have a couple of questions if its not much.
>>>>>>>>>
>>>>>>>>> 1. What is the current implementation?
>>>>>>>>> 2. What are the challenges you are facing?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ashwin.
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I am trying to evaluate apache apex for building an application
>>>>>>>>> that supports what-if analysis support to users. This co-relates closed
>>>>>>>>> with excel kind of functionality where changing a value in one cell
>>>>>>>>> triggers changes in other cell values. In our case we would have multiple
>>>>>>>>> rows in various tables getting updated when the user changes a row value.
>>>>>>>>> The response needs to be in real-time or near real-time.
>>>>>>>>>
>>>>>>>>> Does Apex fit such an use-case? If so, what would be some of
>>>>>>>>> initial steps to evaluate it for this use case?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ashwin.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> Ashwin.
>>>>
>>>
>>>
>>
>

Re: What-if analysis with apex

Posted by Amit Shah <am...@gmail.com>.

Please see my responses below

>1. Loading values for unmodified cells
> What is the source of these unmodified cells?


Table values. Taking an e.g. from the diagram, assuming the user modifies
cell with identifier (table 1, row 1, column 1) we would have to load
values for unmodified cells (table 2, row 2, column 2)  and (table 4, row
4, column 4) to recalculate the values of other cells

> 3. Execute the cells in parallel (if possible)
> Which cells you are referring to? Table1, row 1, column 1 - that is the
> cells that are changed will trigger dependent cells recalculation or the
> two dependent cells?


The modification of the cell with identifier (table 1, row 1, column 1)
would trigger recalculation of the cell values (table 3, row 3, column 3)
and (table 6, row 6, column 6). In this example we cannot do parallel
evaluations but you could imagine a case where there are parallel
calculations that could be possible.

Thanks,
Amit.

On Thu, Jan 28, 2016 at 9:20 PM, Sandeep Deshmukh <sa...@datatorrent.com>
wrote:

> Thanks Amit. We have better understanding of your requirements now.
>
> It is not necessary that each cell will be one operator. Please don't get
> biased by that assumption.
>
> Here are few more queries.
> >1. Loading values for unmodified cells
> What is the source of these unmodified cells?
>
> > 3. Execute the cells in parallel (if possible)
> Which cells you are referring to? Table1, row 1, column 1 - that is the
> cells that are changed will trigger dependent cells recalculation or the
> two dependent cells?
>
> Regards
> Sandeep
> On 28-Jan-2016 8:20 pm, "Amit Shah" <am...@gmail.com> wrote:
>
>> Thanks Sandeep for the follow up. I have tried responding to your
>> queries. Kindly let me know if that gives you an idea on what I am trying
>> to achieve
>>
>> how you will be representing your dependencies in a graph
>>
>>
>> Attached a sample dependency graph. I was assuming each cell to be
>> represented as an operator in apex terms so that they could be executed in
>> parallel
>>
>> How many such dependency graphs will be there?
>>
>>
>> Total number of graphs would be approximately equal to the number of rows
>> that could be modified by the user (considering the worst case). The number
>> should be in 1000's.
>>
>> Do you have one graph per change of cell defining its dependent cells? So,
>>> for the example you mentioned, do you define it as O1 dependent cells into
>>> one graph? Then there is another graph which defines what values are
>>> updated if some other cell O7 is updated.
>>
>>
>> Yes approximately one graph per cell. The dependency graph I have tried
>> presenting in the attached diagram could be executed if any of the cell
>> values in table 1, 2 or 4 are updated. For simplicity I have picked up
>> cells from distinct tables.
>>
>> In my view, once the user sees the tables on the UI, we could create the
>> dependency graphs in the background. Once he/she updates a cell value, our
>> application would figure out its corresponding dependency graph and start
>> its execution by
>> 1. Loading values for unmodified cells
>> 2. Determine the cells (or operators) that are to be recalculated. For
>> e.g. if the cell with identifier as table1, row 1, column 1 is updated, the
>> application would determine that 2 cell values are to be updated.
>> 3. Execute the cells in parallel (if possible)
>> 4. Render the updated values in real time to the user.
>>
>> Thanks,
>> Amit.
>>
>> On Thu, Jan 28, 2016 at 7:28 PM, Sandeep Deshmukh <
>> sandeep@datatorrent.com> wrote:
>>
>>> Hi Amit,
>>>
>>> Your concern is that change of one cell is going to trigger update for
>>> large number of cells and you are interested in doing this in parallel to
>>> get real-time response. This can be very well achieved using Apex.
>>>
>>> I think we are still not very clear on your use case and hence what we
>>> have proposed may not fit match what you are looking for.
>>>
>>> We would like to know how you will be representing your dependencies in
>>> a graph. How many such dependency graphs will be there? Do you have one
>>> graph per change of cell defining its dependent cells? So, for the example
>>> you mentioned, do you define it as O1 dependent cells into one graph? Then
>>> there is another graph which defines what values are updated if some other
>>> cell O7 is updated.
>>>
>>> Once we fully understand your requirements, we should be able to guide
>>> you better.
>>>
>>>
>>> Regards,
>>> Sandeep
>>>
>>> On Thu, Jan 28, 2016 at 2:56 PM, Amit Shah <am...@gmail.com> wrote:
>>>
>>>> Ashwin, Below are follow up queries that I have based on your response.
>>>>
>>>> The store I mentioned is just an abstraction. It can be in memory
>>>>> store, or a cache backed lookup from a database.
>>>>
>>>>
>>>> Yes I understand by the term store but I didn't follow the need of it.
>>>>
>>>> How does your UI interact with your server today?
>>>>
>>>>
>>>> Our UI is built over angularjs so it communicates with the server
>>>> through REST api's.
>>>>
>>>> You dont have to create a new DAG for each cell you are changing. You
>>>>> can have a single DAG running and send across your query with the cell
>>>>> changes in the schema you define. You can perform all corresponding changes
>>>>> for other cells/table rows in the store operator.
>>>>
>>>>
>>>> I was under the impression that by defining one operator per column
>>>> index I could take the advantage of apex running individual operators on
>>>> individual jvm's and hence parallel writes in real-time or near real-time
>>>> response time. If we have single static DAG that accepts the cell
>>>> identiifer (row Id, column index and table id) as parameters then we would
>>>> not be able to concurrently updates cell values right?
>>>> If your understanding is different from the flow I explained in my
>>>> previous mail, what do I gain by using apex?
>>>>
>>>>
>>>> Thanks,
>>>> Amit.
>>>>
>>>>
>>>> On Thu, Jan 28, 2016 at 12:51 AM, Ashwin Chandra Putta <
>>>> ashwinchandrap@gmail.com> wrote:
>>>>
>>>>> Amit,
>>>>>
>>>>> The store I mentioned is just an abstraction. It can be in memory
>>>>> store, or a cache backed lookup from a database.
>>>>>
>>>>> For the query/query response, when interacting with a UI - you can
>>>>> send your queries to the query operator and listen for response from the
>>>>> query response operator. Historically we have used json over websockets to
>>>>> interact from browser. How does your UI interact with your server today?
>>>>>
>>>>> You dont have to create a new DAG for each cell you are changing. You
>>>>> can have a single DAG running and send across your query with the cell
>>>>> changes in the schema you define. You can perform all corresponding changes
>>>>> for other cells/table rows in the store operator.
>>>>>
>>>>> If you still want to depend completely on your existing server for
>>>>> loading initial data, then you can load it to a cache in store and do your
>>>>> analysis on that data in memory.
>>>>>
>>>>> Regards,
>>>>> Ashwin.
>>>>>
>>>>> On Wed, Jan 27, 2016 at 7:42 AM, Amol Kekre <am...@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Amit,
>>>>>> Here are some answers
>>>>>> - Logic that you want to run can be coded as an utility, that is then
>>>>>> invoked by any other operator
>>>>>> - PopulateDAG() is today part of roll out of the app, i.e it is
>>>>>> similar to "compileTime" and not "runTime". You could do runTime, but then
>>>>>> you will need to go through dtcli. Today runTime changes via dtcli will
>>>>>> need a lot more coding. A very early version of runTime changes (based on
>>>>>> system metrics) exist, but the ask is for changes based on application
>>>>>> data. That ask is in the roadmap of module rollout (phase II?) and others
>>>>>> can comment on the roadmap for runtTime populateDAG.
>>>>>> - Outputs of many operators can be streamed as input to one operator
>>>>>> in following ways
>>>>>>    - Each output having different schema will mean different input
>>>>>> ports on that operator as port schema is fixed. This is fine, but will
>>>>>> clutter the DAG
>>>>>>    - If the schema of these output ports is same, there is a merge
>>>>>> operator that does that (
>>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java).
>>>>>> You can write one for Nx1 merge by extending the above class.
>>>>>>
>>>>>> Thks,
>>>>>> Amol
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 27, 2016 at 6:03 AM, Amit Shah <am...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Ashwin for the follow up.
>>>>>>> I am not sure if I completely follow the query -> store -> query
>>>>>>> pattern. What does query mean here? Why would we need a in-memory store?
>>>>>>> Trying to list down the flow I came up with below points
>>>>>>>
>>>>>>>    1. We need to build a DAG after we get to know the cell (table,
>>>>>>>    row and column index) that is modified by the user.
>>>>>>>    2. Once we receive user input (i.e. once the user modifies a
>>>>>>>    value in a table) the populateDAG() method should be called.
>>>>>>>    3. The populateDAG() implementation would
>>>>>>>    1. Determine what cells should be updated across all tables
>>>>>>>       2. Create an Operator per cell that is affected by the
>>>>>>>       change. From the demo code I see dag.addOperator method
>>>>>>>       instantiating an operator. Since the logic to update an cell
>>>>>>>       would be the same across tables how do we create new operators per cell to
>>>>>>>       have a graph that looks what Bhupesh envisioned in his last email reply? In
>>>>>>>       my view the graph would like
>>>>>>>
>>>>>>>                     O1 (for user modified cell) -> O2 (table X, row
>>>>>>> Y, column index 2) -> O5 (table E, row F, column index 10000)
>>>>>>>                                                                O3
>>>>>>> (table M, row N, column index 3)
>>>>>>>                       ->  O6 (update UI)
>>>>>>>                                                                O4
>>>>>>> (table P, row Q, column index 1)
>>>>>>>
>>>>>>>               3. We want the DAG to be evaluated instantly once the
>>>>>>> populateDAG() method finishes. How do we do it?
>>>>>>>               4. Can outputs from many operators be streamed as an
>>>>>>> input to one operator? From the above example outputs from O3, O4
>>>>>>> and O5 need to go to O6.
>>>>>>>
>>>>>>> I appreciate your inputs on this.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Amit.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 27, 2016 at 1:49 PM, Ashwin Chandra Putta <
>>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>>
>>>>>>>> Amit,
>>>>>>>>
>>>>>>>> Thanks for the response. You can use the query --> store --> query
>>>>>>>> result pattern to do the real time updates and lookups for what-if analysis.
>>>>>>>>
>>>>>>>> And you can also ingest your real time input data to the store
>>>>>>>> operator. input --> store.
>>>>>>>>
>>>>>>>> That way, you can keep ingesting your data into the store operator
>>>>>>>> where you will keep your OLAP dimensions and measures.
>>>>>>>>
>>>>>>>> For the query/query result pattern example, see this demo:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ashwin.
>>>>>>>>
>>>>>>>> On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <am...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Appreciate the discussion we are having on this topic.
>>>>>>>>>
>>>>>>>>> Bhupesh, If I understand the flow correctly, we would have to
>>>>>>>>> define one DAG per cell in the table that could be modified by the user.
>>>>>>>>> Given this, it would be right to define the DAG only when the table is
>>>>>>>>> presented to the user on the UI (not at definition time since there would
>>>>>>>>> be many tables). Would it be possible to define DAG at runtime i.e.
>>>>>>>>> defining & wiring the operators at runtime?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ashwin, I am glad to answer these questions
>>>>>>>>>
>>>>>>>>> 1. We are extending our OLTP based application by introducing
>>>>>>>>> analytical features that includes what-if kind of analysis. Other features
>>>>>>>>> do include performing OLAP kind of operations like aggregation, slice &
>>>>>>>>> dice, drill down/up, pivoting. Our first milestone is to target what-if
>>>>>>>>> kind of analysis. We don't have any implementation so far. We are exploring
>>>>>>>>> out solutions to these requirements
>>>>>>>>> 2. The technical challenges we have include having an in-memory
>>>>>>>>> calculation engine system that supports parallel writes and provides real
>>>>>>>>> time or near real time response time.
>>>>>>>>>
>>>>>>>>> Hope that answers your queries.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Amit.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
>>>>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Amit,
>>>>>>>>>>
>>>>>>>>>> I have a couple of questions if its not much.
>>>>>>>>>>
>>>>>>>>>> 1. What is the current implementation?
>>>>>>>>>> 2. What are the challenges you are facing?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ashwin.
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I am trying to evaluate apache apex for building an application
>>>>>>>>>> that supports what-if analysis support to users. This co-relates closed
>>>>>>>>>> with excel kind of functionality where changing a value in one cell
>>>>>>>>>> triggers changes in other cell values. In our case we would have multiple
>>>>>>>>>> rows in various tables getting updated when the user changes a row value.
>>>>>>>>>> The response needs to be in real-time or near real-time.
>>>>>>>>>>
>>>>>>>>>> Does Apex fit such an use-case? If so, what would be some of
>>>>>>>>>> initial steps to evaluate it for this use case?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ashwin.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Regards,
>>>>> Ashwin.
>>>>>
>>>>
>>>>
>>>
>>

Re: What-if analysis with apex

Posted by Ashwin Chandra Putta <as...@gmail.com>.

Thanks Amol and Sandeep for chiming in.

Amit,

>From the discussion, here is my understanding.

You have a database where you have all the rows in multiple tables. You
want to be able to update all corresponding rows in the database when a
cell is changed on the UI. You want this operation of updating the rows to
be done in real time concurrently using existing rest API (as current
process seems to take more time). You also want to update the UI with the
updated cell values from the database.

Please correct if anything is not inline with what you are looking for.

Apex as a platform provides partitioning of the operators to achieve
parallel processing. When an operator is partitioned, multiple instances of
the operator are run. Example, you can run 4 instances of an operator and
distribute load to these 4 partitions. So you can write a function once and
run it parallelly in multiple partitions.

Having said that, one approach for overall design is to create one operator
per cell to achieve parallelism but then you will have to write one
operator per cell and then you will have to add the operator to the dag
every time there is a new operator. You will have to add multiple output
ports, one per each downstream operator. Although, technically it will
work, it will end up in a lot of boilerplate code. Another approach is to
let one operator handle all cells but then partition it so that multiple
user requests are processed parallelly. That way, you don't have to worry
about adding new operators to dag, adding new ports to existing operators
etc. The design is elegant and you can let the platform handle the parallel
processing by creating multiple partitions of your operator. This should be
high in performance too. If within the operator, you want to do parallel
evaluations/writes for each row, it can be done by separating out
evaluation to update the dependency graph within the operator in memory and
do I/O to db asynchronously with multiple threads concurrently.

We might have to do a deeper technical discussion, let me know if you like
to talk directly for a deeper analysis. We can summarize our findings here
after the discussion.

Regards,
Ashwin.

On Thu, Jan 28, 2016 at 7:50 AM, Sandeep Deshmukh <sa...@datatorrent.com>
wrote:

> Thanks Amit. We have better understanding of your requirements now.
>
> It is not necessary that each cell will be one operator. Please don't get
> biased by that assumption.
>
> Here are few more queries.
> >1. Loading values for unmodified cells
> What is the source of these unmodified cells?
>
> > 3. Execute the cells in parallel (if possible)
> Which cells you are referring to? Table1, row 1, column 1 - that is the
> cells that are changed will trigger dependent cells recalculation or the
> two dependent cells?
>
> Regards
> Sandeep
> On 28-Jan-2016 8:20 pm, "Amit Shah" <am...@gmail.com> wrote:
>
>> Thanks Sandeep for the follow up. I have tried responding to your
>> queries. Kindly let me know if that gives you an idea on what I am trying
>> to achieve
>>
>> how you will be representing your dependencies in a graph
>>
>>
>> Attached a sample dependency graph. I was assuming each cell to be
>> represented as an operator in apex terms so that they could be executed in
>> parallel
>>
>> How many such dependency graphs will be there?
>>
>>
>> Total number of graphs would be approximately equal to the number of rows
>> that could be modified by the user (considering the worst case). The number
>> should be in 1000's.
>>
>> Do you have one graph per change of cell defining its dependent cells? So,
>>> for the example you mentioned, do you define it as O1 dependent cells into
>>> one graph? Then there is another graph which defines what values are
>>> updated if some other cell O7 is updated.
>>
>>
>> Yes approximately one graph per cell. The dependency graph I have tried
>> presenting in the attached diagram could be executed if any of the cell
>> values in table 1, 2 or 4 are updated. For simplicity I have picked up
>> cells from distinct tables.
>>
>> In my view, once the user sees the tables on the UI, we could create the
>> dependency graphs in the background. Once he/she updates a cell value, our
>> application would figure out its corresponding dependency graph and start
>> its execution by
>> 1. Loading values for unmodified cells
>> 2. Determine the cells (or operators) that are to be recalculated. For
>> e.g. if the cell with identifier as table1, row 1, column 1 is updated, the
>> application would determine that 2 cell values are to be updated.
>> 3. Execute the cells in parallel (if possible)
>> 4. Render the updated values in real time to the user.
>>
>> Thanks,
>> Amit.
>>
>> On Thu, Jan 28, 2016 at 7:28 PM, Sandeep Deshmukh <
>> sandeep@datatorrent.com> wrote:
>>
>>> Hi Amit,
>>>
>>> Your concern is that change of one cell is going to trigger update for
>>> large number of cells and you are interested in doing this in parallel to
>>> get real-time response. This can be very well achieved using Apex.
>>>
>>> I think we are still not very clear on your use case and hence what we
>>> have proposed may not fit match what you are looking for.
>>>
>>> We would like to know how you will be representing your dependencies in
>>> a graph. How many such dependency graphs will be there? Do you have one
>>> graph per change of cell defining its dependent cells? So, for the example
>>> you mentioned, do you define it as O1 dependent cells into one graph? Then
>>> there is another graph which defines what values are updated if some other
>>> cell O7 is updated.
>>>
>>> Once we fully understand your requirements, we should be able to guide
>>> you better.
>>>
>>>
>>> Regards,
>>> Sandeep
>>>
>>> On Thu, Jan 28, 2016 at 2:56 PM, Amit Shah <am...@gmail.com> wrote:
>>>
>>>> Ashwin, Below are follow up queries that I have based on your response.
>>>>
>>>> The store I mentioned is just an abstraction. It can be in memory
>>>>> store, or a cache backed lookup from a database.
>>>>
>>>>
>>>> Yes I understand by the term store but I didn't follow the need of it.
>>>>
>>>> How does your UI interact with your server today?
>>>>
>>>>
>>>> Our UI is built over angularjs so it communicates with the server
>>>> through REST api's.
>>>>
>>>> You dont have to create a new DAG for each cell you are changing. You
>>>>> can have a single DAG running and send across your query with the cell
>>>>> changes in the schema you define. You can perform all corresponding changes
>>>>> for other cells/table rows in the store operator.
>>>>
>>>>
>>>> I was under the impression that by defining one operator per column
>>>> index I could take the advantage of apex running individual operators on
>>>> individual jvm's and hence parallel writes in real-time or near real-time
>>>> response time. If we have single static DAG that accepts the cell
>>>> identiifer (row Id, column index and table id) as parameters then we would
>>>> not be able to concurrently updates cell values right?
>>>> If your understanding is different from the flow I explained in my
>>>> previous mail, what do I gain by using apex?
>>>>
>>>>
>>>> Thanks,
>>>> Amit.
>>>>
>>>>
>>>> On Thu, Jan 28, 2016 at 12:51 AM, Ashwin Chandra Putta <
>>>> ashwinchandrap@gmail.com> wrote:
>>>>
>>>>> Amit,
>>>>>
>>>>> The store I mentioned is just an abstraction. It can be in memory
>>>>> store, or a cache backed lookup from a database.
>>>>>
>>>>> For the query/query response, when interacting with a UI - you can
>>>>> send your queries to the query operator and listen for response from the
>>>>> query response operator. Historically we have used json over websockets to
>>>>> interact from browser. How does your UI interact with your server today?
>>>>>
>>>>> You dont have to create a new DAG for each cell you are changing. You
>>>>> can have a single DAG running and send across your query with the cell
>>>>> changes in the schema you define. You can perform all corresponding changes
>>>>> for other cells/table rows in the store operator.
>>>>>
>>>>> If you still want to depend completely on your existing server for
>>>>> loading initial data, then you can load it to a cache in store and do your
>>>>> analysis on that data in memory.
>>>>>
>>>>> Regards,
>>>>> Ashwin.
>>>>>
>>>>> On Wed, Jan 27, 2016 at 7:42 AM, Amol Kekre <am...@datatorrent.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Amit,
>>>>>> Here are some answers
>>>>>> - Logic that you want to run can be coded as an utility, that is then
>>>>>> invoked by any other operator
>>>>>> - PopulateDAG() is today part of roll out of the app, i.e it is
>>>>>> similar to "compileTime" and not "runTime". You could do runTime, but then
>>>>>> you will need to go through dtcli. Today runTime changes via dtcli will
>>>>>> need a lot more coding. A very early version of runTime changes (based on
>>>>>> system metrics) exist, but the ask is for changes based on application
>>>>>> data. That ask is in the roadmap of module rollout (phase II?) and others
>>>>>> can comment on the roadmap for runtTime populateDAG.
>>>>>> - Outputs of many operators can be streamed as input to one operator
>>>>>> in following ways
>>>>>>    - Each output having different schema will mean different input
>>>>>> ports on that operator as port schema is fixed. This is fine, but will
>>>>>> clutter the DAG
>>>>>>    - If the schema of these output ports is same, there is a merge
>>>>>> operator that does that (
>>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java).
>>>>>> You can write one for Nx1 merge by extending the above class.
>>>>>>
>>>>>> Thks,
>>>>>> Amol
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 27, 2016 at 6:03 AM, Amit Shah <am...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Ashwin for the follow up.
>>>>>>> I am not sure if I completely follow the query -> store -> query
>>>>>>> pattern. What does query mean here? Why would we need a in-memory store?
>>>>>>> Trying to list down the flow I came up with below points
>>>>>>>
>>>>>>>    1. We need to build a DAG after we get to know the cell (table,
>>>>>>>    row and column index) that is modified by the user.
>>>>>>>    2. Once we receive user input (i.e. once the user modifies a
>>>>>>>    value in a table) the populateDAG() method should be called.
>>>>>>>    3. The populateDAG() implementation would
>>>>>>>    1. Determine what cells should be updated across all tables
>>>>>>>       2. Create an Operator per cell that is affected by the
>>>>>>>       change. From the demo code I see dag.addOperator method
>>>>>>>       instantiating an operator. Since the logic to update an cell
>>>>>>>       would be the same across tables how do we create new operators per cell to
>>>>>>>       have a graph that looks what Bhupesh envisioned in his last email reply? In
>>>>>>>       my view the graph would like
>>>>>>>
>>>>>>>                     O1 (for user modified cell) -> O2 (table X, row
>>>>>>> Y, column index 2) -> O5 (table E, row F, column index 10000)
>>>>>>>                                                                O3
>>>>>>> (table M, row N, column index 3)
>>>>>>>                       ->  O6 (update UI)
>>>>>>>                                                                O4
>>>>>>> (table P, row Q, column index 1)
>>>>>>>
>>>>>>>               3. We want the DAG to be evaluated instantly once the
>>>>>>> populateDAG() method finishes. How do we do it?
>>>>>>>               4. Can outputs from many operators be streamed as an
>>>>>>> input to one operator? From the above example outputs from O3, O4
>>>>>>> and O5 need to go to O6.
>>>>>>>
>>>>>>> I appreciate your inputs on this.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Amit.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 27, 2016 at 1:49 PM, Ashwin Chandra Putta <
>>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>>
>>>>>>>> Amit,
>>>>>>>>
>>>>>>>> Thanks for the response. You can use the query --> store --> query
>>>>>>>> result pattern to do the real time updates and lookups for what-if analysis.
>>>>>>>>
>>>>>>>> And you can also ingest your real time input data to the store
>>>>>>>> operator. input --> store.
>>>>>>>>
>>>>>>>> That way, you can keep ingesting your data into the store operator
>>>>>>>> where you will keep your OLAP dimensions and measures.
>>>>>>>>
>>>>>>>> For the query/query result pattern example, see this demo:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ashwin.
>>>>>>>>
>>>>>>>> On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <am...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Appreciate the discussion we are having on this topic.
>>>>>>>>>
>>>>>>>>> Bhupesh, If I understand the flow correctly, we would have to
>>>>>>>>> define one DAG per cell in the table that could be modified by the user.
>>>>>>>>> Given this, it would be right to define the DAG only when the table is
>>>>>>>>> presented to the user on the UI (not at definition time since there would
>>>>>>>>> be many tables). Would it be possible to define DAG at runtime i.e.
>>>>>>>>> defining & wiring the operators at runtime?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ashwin, I am glad to answer these questions
>>>>>>>>>
>>>>>>>>> 1. We are extending our OLTP based application by introducing
>>>>>>>>> analytical features that includes what-if kind of analysis. Other features
>>>>>>>>> do include performing OLAP kind of operations like aggregation, slice &
>>>>>>>>> dice, drill down/up, pivoting. Our first milestone is to target what-if
>>>>>>>>> kind of analysis. We don't have any implementation so far. We are exploring
>>>>>>>>> out solutions to these requirements
>>>>>>>>> 2. The technical challenges we have include having an in-memory
>>>>>>>>> calculation engine system that supports parallel writes and provides real
>>>>>>>>> time or near real time response time.
>>>>>>>>>
>>>>>>>>> Hope that answers your queries.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Amit.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
>>>>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Amit,
>>>>>>>>>>
>>>>>>>>>> I have a couple of questions if its not much.
>>>>>>>>>>
>>>>>>>>>> 1. What is the current implementation?
>>>>>>>>>> 2. What are the challenges you are facing?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ashwin.
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I am trying to evaluate apache apex for building an application
>>>>>>>>>> that supports what-if analysis support to users. This co-relates closed
>>>>>>>>>> with excel kind of functionality where changing a value in one cell
>>>>>>>>>> triggers changes in other cell values. In our case we would have multiple
>>>>>>>>>> rows in various tables getting updated when the user changes a row value.
>>>>>>>>>> The response needs to be in real-time or near real-time.
>>>>>>>>>>
>>>>>>>>>> Does Apex fit such an use-case? If so, what would be some of
>>>>>>>>>> initial steps to evaluate it for this use case?
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ashwin.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Regards,
>>>>> Ashwin.
>>>>>
>>>>
>>>>
>>>
>>


-- 

Regards,
Ashwin.

Re: What-if analysis with apex

Posted by Sandeep Deshmukh <sa...@datatorrent.com>.

Thanks Amit. We have better understanding of your requirements now.

It is not necessary that each cell will be one operator. Please don't get
biased by that assumption.

Here are few more queries.
>1. Loading values for unmodified cells
What is the source of these unmodified cells?

> 3. Execute the cells in parallel (if possible)
Which cells you are referring to? Table1, row 1, column 1 - that is the
cells that are changed will trigger dependent cells recalculation or the
two dependent cells?

Regards
Sandeep
On 28-Jan-2016 8:20 pm, "Amit Shah" <am...@gmail.com> wrote:

> Thanks Sandeep for the follow up. I have tried responding to your queries.
> Kindly let me know if that gives you an idea on what I am trying to achieve
>
> how you will be representing your dependencies in a graph
>
>
> Attached a sample dependency graph. I was assuming each cell to be
> represented as an operator in apex terms so that they could be executed in
> parallel
>
> How many such dependency graphs will be there?
>
>
> Total number of graphs would be approximately equal to the number of rows
> that could be modified by the user (considering the worst case). The number
> should be in 1000's.
>
> Do you have one graph per change of cell defining its dependent cells? So,
>> for the example you mentioned, do you define it as O1 dependent cells into
>> one graph? Then there is another graph which defines what values are
>> updated if some other cell O7 is updated.
>
>
> Yes approximately one graph per cell. The dependency graph I have tried
> presenting in the attached diagram could be executed if any of the cell
> values in table 1, 2 or 4 are updated. For simplicity I have picked up
> cells from distinct tables.
>
> In my view, once the user sees the tables on the UI, we could create the
> dependency graphs in the background. Once he/she updates a cell value, our
> application would figure out its corresponding dependency graph and start
> its execution by
> 1. Loading values for unmodified cells
> 2. Determine the cells (or operators) that are to be recalculated. For
> e.g. if the cell with identifier as table1, row 1, column 1 is updated, the
> application would determine that 2 cell values are to be updated.
> 3. Execute the cells in parallel (if possible)
> 4. Render the updated values in real time to the user.
>
> Thanks,
> Amit.
>
> On Thu, Jan 28, 2016 at 7:28 PM, Sandeep Deshmukh <sandeep@datatorrent.com
> > wrote:
>
>> Hi Amit,
>>
>> Your concern is that change of one cell is going to trigger update for
>> large number of cells and you are interested in doing this in parallel to
>> get real-time response. This can be very well achieved using Apex.
>>
>> I think we are still not very clear on your use case and hence what we
>> have proposed may not fit match what you are looking for.
>>
>> We would like to know how you will be representing your dependencies in a
>> graph. How many such dependency graphs will be there? Do you have one graph
>> per change of cell defining its dependent cells? So, for the example you
>> mentioned, do you define it as O1 dependent cells into one graph? Then
>> there is another graph which defines what values are updated if some other
>> cell O7 is updated.
>>
>> Once we fully understand your requirements, we should be able to guide
>> you better.
>>
>>
>> Regards,
>> Sandeep
>>
>> On Thu, Jan 28, 2016 at 2:56 PM, Amit Shah <am...@gmail.com> wrote:
>>
>>> Ashwin, Below are follow up queries that I have based on your response.
>>>
>>> The store I mentioned is just an abstraction. It can be in memory store,
>>>> or a cache backed lookup from a database.
>>>
>>>
>>> Yes I understand by the term store but I didn't follow the need of it.
>>>
>>> How does your UI interact with your server today?
>>>
>>>
>>> Our UI is built over angularjs so it communicates with the server
>>> through REST api's.
>>>
>>> You dont have to create a new DAG for each cell you are changing. You
>>>> can have a single DAG running and send across your query with the cell
>>>> changes in the schema you define. You can perform all corresponding changes
>>>> for other cells/table rows in the store operator.
>>>
>>>
>>> I was under the impression that by defining one operator per column
>>> index I could take the advantage of apex running individual operators on
>>> individual jvm's and hence parallel writes in real-time or near real-time
>>> response time. If we have single static DAG that accepts the cell
>>> identiifer (row Id, column index and table id) as parameters then we would
>>> not be able to concurrently updates cell values right?
>>> If your understanding is different from the flow I explained in my
>>> previous mail, what do I gain by using apex?
>>>
>>>
>>> Thanks,
>>> Amit.
>>>
>>>
>>> On Thu, Jan 28, 2016 at 12:51 AM, Ashwin Chandra Putta <
>>> ashwinchandrap@gmail.com> wrote:
>>>
>>>> Amit,
>>>>
>>>> The store I mentioned is just an abstraction. It can be in memory
>>>> store, or a cache backed lookup from a database.
>>>>
>>>> For the query/query response, when interacting with a UI - you can send
>>>> your queries to the query operator and listen for response from the query
>>>> response operator. Historically we have used json over websockets to
>>>> interact from browser. How does your UI interact with your server today?
>>>>
>>>> You dont have to create a new DAG for each cell you are changing. You
>>>> can have a single DAG running and send across your query with the cell
>>>> changes in the schema you define. You can perform all corresponding changes
>>>> for other cells/table rows in the store operator.
>>>>
>>>> If you still want to depend completely on your existing server for
>>>> loading initial data, then you can load it to a cache in store and do your
>>>> analysis on that data in memory.
>>>>
>>>> Regards,
>>>> Ashwin.
>>>>
>>>> On Wed, Jan 27, 2016 at 7:42 AM, Amol Kekre <am...@datatorrent.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Amit,
>>>>> Here are some answers
>>>>> - Logic that you want to run can be coded as an utility, that is then
>>>>> invoked by any other operator
>>>>> - PopulateDAG() is today part of roll out of the app, i.e it is
>>>>> similar to "compileTime" and not "runTime". You could do runTime, but then
>>>>> you will need to go through dtcli. Today runTime changes via dtcli will
>>>>> need a lot more coding. A very early version of runTime changes (based on
>>>>> system metrics) exist, but the ask is for changes based on application
>>>>> data. That ask is in the roadmap of module rollout (phase II?) and others
>>>>> can comment on the roadmap for runtTime populateDAG.
>>>>> - Outputs of many operators can be streamed as input to one operator
>>>>> in following ways
>>>>>    - Each output having different schema will mean different input
>>>>> ports on that operator as port schema is fixed. This is fine, but will
>>>>> clutter the DAG
>>>>>    - If the schema of these output ports is same, there is a merge
>>>>> operator that does that (
>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java).
>>>>> You can write one for Nx1 merge by extending the above class.
>>>>>
>>>>> Thks,
>>>>> Amol
>>>>>
>>>>>
>>>>> On Wed, Jan 27, 2016 at 6:03 AM, Amit Shah <am...@gmail.com> wrote:
>>>>>
>>>>>> Thanks Ashwin for the follow up.
>>>>>> I am not sure if I completely follow the query -> store -> query
>>>>>> pattern. What does query mean here? Why would we need a in-memory store?
>>>>>> Trying to list down the flow I came up with below points
>>>>>>
>>>>>>    1. We need to build a DAG after we get to know the cell (table,
>>>>>>    row and column index) that is modified by the user.
>>>>>>    2. Once we receive user input (i.e. once the user modifies a
>>>>>>    value in a table) the populateDAG() method should be called.
>>>>>>    3. The populateDAG() implementation would
>>>>>>    1. Determine what cells should be updated across all tables
>>>>>>       2. Create an Operator per cell that is affected by the change.
>>>>>>       From the demo code I see dag.addOperator method instantiating
>>>>>>       an operator. Since the logic to update an cell would be the
>>>>>>       same across tables how do we create new operators per cell to have a graph
>>>>>>       that looks what Bhupesh envisioned in his last email reply? In
>>>>>>       my view the graph would like
>>>>>>
>>>>>>                     O1 (for user modified cell) -> O2 (table X, row
>>>>>> Y, column index 2) -> O5 (table E, row F, column index 10000)
>>>>>>                                                                O3
>>>>>> (table M, row N, column index 3)
>>>>>>                       ->  O6 (update UI)
>>>>>>                                                                O4
>>>>>> (table P, row Q, column index 1)
>>>>>>
>>>>>>               3. We want the DAG to be evaluated instantly once the
>>>>>> populateDAG() method finishes. How do we do it?
>>>>>>               4. Can outputs from many operators be streamed as an
>>>>>> input to one operator? From the above example outputs from O3, O4
>>>>>> and O5 need to go to O6.
>>>>>>
>>>>>> I appreciate your inputs on this.
>>>>>>
>>>>>> Thanks,
>>>>>> Amit.
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 27, 2016 at 1:49 PM, Ashwin Chandra Putta <
>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>
>>>>>>> Amit,
>>>>>>>
>>>>>>> Thanks for the response. You can use the query --> store --> query
>>>>>>> result pattern to do the real time updates and lookups for what-if analysis.
>>>>>>>
>>>>>>> And you can also ingest your real time input data to the store
>>>>>>> operator. input --> store.
>>>>>>>
>>>>>>> That way, you can keep ingesting your data into the store operator
>>>>>>> where you will keep your OLAP dimensions and measures.
>>>>>>>
>>>>>>> For the query/query result pattern example, see this demo:
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ashwin.
>>>>>>>
>>>>>>> On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <am...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Appreciate the discussion we are having on this topic.
>>>>>>>>
>>>>>>>> Bhupesh, If I understand the flow correctly, we would have to
>>>>>>>> define one DAG per cell in the table that could be modified by the user.
>>>>>>>> Given this, it would be right to define the DAG only when the table is
>>>>>>>> presented to the user on the UI (not at definition time since there would
>>>>>>>> be many tables). Would it be possible to define DAG at runtime i.e.
>>>>>>>> defining & wiring the operators at runtime?
>>>>>>>>
>>>>>>>>
>>>>>>>> Ashwin, I am glad to answer these questions
>>>>>>>>
>>>>>>>> 1. We are extending our OLTP based application by introducing
>>>>>>>> analytical features that includes what-if kind of analysis. Other features
>>>>>>>> do include performing OLAP kind of operations like aggregation, slice &
>>>>>>>> dice, drill down/up, pivoting. Our first milestone is to target what-if
>>>>>>>> kind of analysis. We don't have any implementation so far. We are exploring
>>>>>>>> out solutions to these requirements
>>>>>>>> 2. The technical challenges we have include having an in-memory
>>>>>>>> calculation engine system that supports parallel writes and provides real
>>>>>>>> time or near real time response time.
>>>>>>>>
>>>>>>>> Hope that answers your queries.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Amit.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
>>>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Amit,
>>>>>>>>>
>>>>>>>>> I have a couple of questions if its not much.
>>>>>>>>>
>>>>>>>>> 1. What is the current implementation?
>>>>>>>>> 2. What are the challenges you are facing?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ashwin.
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I am trying to evaluate apache apex for building an application
>>>>>>>>> that supports what-if analysis support to users. This co-relates closed
>>>>>>>>> with excel kind of functionality where changing a value in one cell
>>>>>>>>> triggers changes in other cell values. In our case we would have multiple
>>>>>>>>> rows in various tables getting updated when the user changes a row value.
>>>>>>>>> The response needs to be in real-time or near real-time.
>>>>>>>>>
>>>>>>>>> Does Apex fit such an use-case? If so, what would be some of
>>>>>>>>> initial steps to evaluate it for this use case?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ashwin.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> Ashwin.
>>>>
>>>
>>>
>>
>

Re: What-if analysis with apex

Posted by Amit Shah <am...@gmail.com>.

Thanks Sandeep for the follow up. I have tried responding to your queries.
Kindly let me know if that gives you an idea on what I am trying to achieve

how you will be representing your dependencies in a graph


Attached a sample dependency graph. I was assuming each cell to be
represented as an operator in apex terms so that they could be executed in
parallel

How many such dependency graphs will be there?


Total number of graphs would be approximately equal to the number of rows
that could be modified by the user (considering the worst case). The number
should be in 1000's.

Do you have one graph per change of cell defining its dependent cells? So,
> for the example you mentioned, do you define it as O1 dependent cells into
> one graph? Then there is another graph which defines what values are
> updated if some other cell O7 is updated.


Yes approximately one graph per cell. The dependency graph I have tried
presenting in the attached diagram could be executed if any of the cell
values in table 1, 2 or 4 are updated. For simplicity I have picked up
cells from distinct tables.

In my view, once the user sees the tables on the UI, we could create the
dependency graphs in the background. Once he/she updates a cell value, our
application would figure out its corresponding dependency graph and start
its execution by
1. Loading values for unmodified cells
2. Determine the cells (or operators) that are to be recalculated. For e.g.
if the cell with identifier as table1, row 1, column 1 is updated, the
application would determine that 2 cell values are to be updated.
3. Execute the cells in parallel (if possible)
4. Render the updated values in real time to the user.

Thanks,
Amit.

On Thu, Jan 28, 2016 at 7:28 PM, Sandeep Deshmukh <sa...@datatorrent.com>
wrote:

> Hi Amit,
>
> Your concern is that change of one cell is going to trigger update for
> large number of cells and you are interested in doing this in parallel to
> get real-time response. This can be very well achieved using Apex.
>
> I think we are still not very clear on your use case and hence what we
> have proposed may not fit match what you are looking for.
>
> We would like to know how you will be representing your dependencies in a
> graph. How many such dependency graphs will be there? Do you have one graph
> per change of cell defining its dependent cells? So, for the example you
> mentioned, do you define it as O1 dependent cells into one graph? Then
> there is another graph which defines what values are updated if some other
> cell O7 is updated.
>
> Once we fully understand your requirements, we should be able to guide you
> better.
>
>
> Regards,
> Sandeep
>
> On Thu, Jan 28, 2016 at 2:56 PM, Amit Shah <am...@gmail.com> wrote:
>
>> Ashwin, Below are follow up queries that I have based on your response.
>>
>> The store I mentioned is just an abstraction. It can be in memory store,
>>> or a cache backed lookup from a database.
>>
>>
>> Yes I understand by the term store but I didn't follow the need of it.
>>
>> How does your UI interact with your server today?
>>
>>
>> Our UI is built over angularjs so it communicates with the server through
>> REST api's.
>>
>> You dont have to create a new DAG for each cell you are changing. You can
>>> have a single DAG running and send across your query with the cell changes
>>> in the schema you define. You can perform all corresponding changes for
>>> other cells/table rows in the store operator.
>>
>>
>> I was under the impression that by defining one operator per column index
>> I could take the advantage of apex running individual operators on
>> individual jvm's and hence parallel writes in real-time or near real-time
>> response time. If we have single static DAG that accepts the cell
>> identiifer (row Id, column index and table id) as parameters then we would
>> not be able to concurrently updates cell values right?
>> If your understanding is different from the flow I explained in my
>> previous mail, what do I gain by using apex?
>>
>>
>> Thanks,
>> Amit.
>>
>>
>> On Thu, Jan 28, 2016 at 12:51 AM, Ashwin Chandra Putta <
>> ashwinchandrap@gmail.com> wrote:
>>
>>> Amit,
>>>
>>> The store I mentioned is just an abstraction. It can be in memory store,
>>> or a cache backed lookup from a database.
>>>
>>> For the query/query response, when interacting with a UI - you can send
>>> your queries to the query operator and listen for response from the query
>>> response operator. Historically we have used json over websockets to
>>> interact from browser. How does your UI interact with your server today?
>>>
>>> You dont have to create a new DAG for each cell you are changing. You
>>> can have a single DAG running and send across your query with the cell
>>> changes in the schema you define. You can perform all corresponding changes
>>> for other cells/table rows in the store operator.
>>>
>>> If you still want to depend completely on your existing server for
>>> loading initial data, then you can load it to a cache in store and do your
>>> analysis on that data in memory.
>>>
>>> Regards,
>>> Ashwin.
>>>
>>> On Wed, Jan 27, 2016 at 7:42 AM, Amol Kekre <am...@datatorrent.com>
>>> wrote:
>>>
>>>>
>>>> Amit,
>>>> Here are some answers
>>>> - Logic that you want to run can be coded as an utility, that is then
>>>> invoked by any other operator
>>>> - PopulateDAG() is today part of roll out of the app, i.e it is similar
>>>> to "compileTime" and not "runTime". You could do runTime, but then you will
>>>> need to go through dtcli. Today runTime changes via dtcli will need a lot
>>>> more coding. A very early version of runTime changes (based on system
>>>> metrics) exist, but the ask is for changes based on application data. That
>>>> ask is in the roadmap of module rollout (phase II?) and others can comment
>>>> on the roadmap for runtTime populateDAG.
>>>> - Outputs of many operators can be streamed as input to one operator in
>>>> following ways
>>>>    - Each output having different schema will mean different input
>>>> ports on that operator as port schema is fixed. This is fine, but will
>>>> clutter the DAG
>>>>    - If the schema of these output ports is same, there is a merge
>>>> operator that does that (
>>>> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java).
>>>> You can write one for Nx1 merge by extending the above class.
>>>>
>>>> Thks,
>>>> Amol
>>>>
>>>>
>>>> On Wed, Jan 27, 2016 at 6:03 AM, Amit Shah <am...@gmail.com> wrote:
>>>>
>>>>> Thanks Ashwin for the follow up.
>>>>> I am not sure if I completely follow the query -> store -> query
>>>>> pattern. What does query mean here? Why would we need a in-memory store?
>>>>> Trying to list down the flow I came up with below points
>>>>>
>>>>>    1. We need to build a DAG after we get to know the cell (table,
>>>>>    row and column index) that is modified by the user.
>>>>>    2. Once we receive user input (i.e. once the user modifies a value
>>>>>    in a table) the populateDAG() method should be called.
>>>>>    3. The populateDAG() implementation would
>>>>>    1. Determine what cells should be updated across all tables
>>>>>       2. Create an Operator per cell that is affected by the change.
>>>>>       From the demo code I see dag.addOperator method instantiating
>>>>>       an operator. Since the logic to update an cell would be the
>>>>>       same across tables how do we create new operators per cell to have a graph
>>>>>       that looks what Bhupesh envisioned in his last email reply? In
>>>>>       my view the graph would like
>>>>>
>>>>>                     O1 (for user modified cell) -> O2 (table X, row Y,
>>>>> column index 2) -> O5 (table E, row F, column index 10000)
>>>>>                                                                O3
>>>>> (table M, row N, column index 3)
>>>>>                       ->  O6 (update UI)
>>>>>                                                                O4
>>>>> (table P, row Q, column index 1)
>>>>>
>>>>>               3. We want the DAG to be evaluated instantly once the
>>>>> populateDAG() method finishes. How do we do it?
>>>>>               4. Can outputs from many operators be streamed as an
>>>>> input to one operator? From the above example outputs from O3, O4 and
>>>>> O5 need to go to O6.
>>>>>
>>>>> I appreciate your inputs on this.
>>>>>
>>>>> Thanks,
>>>>> Amit.
>>>>>
>>>>>
>>>>> On Wed, Jan 27, 2016 at 1:49 PM, Ashwin Chandra Putta <
>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>
>>>>>> Amit,
>>>>>>
>>>>>> Thanks for the response. You can use the query --> store --> query
>>>>>> result pattern to do the real time updates and lookups for what-if analysis.
>>>>>>
>>>>>> And you can also ingest your real time input data to the store
>>>>>> operator. input --> store.
>>>>>>
>>>>>> That way, you can keep ingesting your data into the store operator
>>>>>> where you will keep your OLAP dimensions and measures.
>>>>>>
>>>>>> For the query/query result pattern example, see this demo:
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java
>>>>>>
>>>>>> Regards,
>>>>>> Ashwin.
>>>>>>
>>>>>> On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <am...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Appreciate the discussion we are having on this topic.
>>>>>>>
>>>>>>> Bhupesh, If I understand the flow correctly, we would have to define
>>>>>>> one DAG per cell in the table that could be modified by the user. Given
>>>>>>> this, it would be right to define the DAG only when the table is presented
>>>>>>> to the user on the UI (not at definition time since there would be many
>>>>>>> tables). Would it be possible to define DAG at runtime i.e. defining &
>>>>>>> wiring the operators at runtime?
>>>>>>>
>>>>>>>
>>>>>>> Ashwin, I am glad to answer these questions
>>>>>>>
>>>>>>> 1. We are extending our OLTP based application by introducing
>>>>>>> analytical features that includes what-if kind of analysis. Other features
>>>>>>> do include performing OLAP kind of operations like aggregation, slice &
>>>>>>> dice, drill down/up, pivoting. Our first milestone is to target what-if
>>>>>>> kind of analysis. We don't have any implementation so far. We are exploring
>>>>>>> out solutions to these requirements
>>>>>>> 2. The technical challenges we have include having an in-memory
>>>>>>> calculation engine system that supports parallel writes and provides real
>>>>>>> time or near real time response time.
>>>>>>>
>>>>>>> Hope that answers your queries.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Amit.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
>>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>>
>>>>>>>> Amit,
>>>>>>>>
>>>>>>>> I have a couple of questions if its not much.
>>>>>>>>
>>>>>>>> 1. What is the current implementation?
>>>>>>>> 2. What are the challenges you are facing?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Ashwin.
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I am trying to evaluate apache apex for building an application
>>>>>>>> that supports what-if analysis support to users. This co-relates closed
>>>>>>>> with excel kind of functionality where changing a value in one cell
>>>>>>>> triggers changes in other cell values. In our case we would have multiple
>>>>>>>> rows in various tables getting updated when the user changes a row value.
>>>>>>>> The response needs to be in real-time or near real-time.
>>>>>>>>
>>>>>>>> Does Apex fit such an use-case? If so, what would be some of
>>>>>>>> initial steps to evaluate it for this use case?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Regards,
>>>>>> Ashwin.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Regards,
>>> Ashwin.
>>>
>>
>>
>

Re: What-if analysis with apex

Posted by Sandeep Deshmukh <sa...@datatorrent.com>.

Hi Amit,

Your concern is that change of one cell is going to trigger update for
large number of cells and you are interested in doing this in parallel to
get real-time response. This can be very well achieved using Apex.

I think we are still not very clear on your use case and hence what we have
proposed may not fit match what you are looking for.

We would like to know how you will be representing your dependencies in a
graph. How many such dependency graphs will be there? Do you have one graph
per change of cell defining its dependent cells? So, for the example you
mentioned, do you define it as O1 dependent cells into one graph? Then
there is another graph which defines what values are updated if some other
cell O7 is updated.

Once we fully understand your requirements, we should be able to guide you
better.


Regards,
Sandeep

On Thu, Jan 28, 2016 at 2:56 PM, Amit Shah <am...@gmail.com> wrote:

> Ashwin, Below are follow up queries that I have based on your response.
>
> The store I mentioned is just an abstraction. It can be in memory store,
>> or a cache backed lookup from a database.
>
>
> Yes I understand by the term store but I didn't follow the need of it.
>
> How does your UI interact with your server today?
>
>
> Our UI is built over angularjs so it communicates with the server through
> REST api's.
>
> You dont have to create a new DAG for each cell you are changing. You can
>> have a single DAG running and send across your query with the cell changes
>> in the schema you define. You can perform all corresponding changes for
>> other cells/table rows in the store operator.
>
>
> I was under the impression that by defining one operator per column index
> I could take the advantage of apex running individual operators on
> individual jvm's and hence parallel writes in real-time or near real-time
> response time. If we have single static DAG that accepts the cell
> identiifer (row Id, column index and table id) as parameters then we would
> not be able to concurrently updates cell values right?
> If your understanding is different from the flow I explained in my
> previous mail, what do I gain by using apex?
>
>
> Thanks,
> Amit.
>
>
> On Thu, Jan 28, 2016 at 12:51 AM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>
>> Amit,
>>
>> The store I mentioned is just an abstraction. It can be in memory store,
>> or a cache backed lookup from a database.
>>
>> For the query/query response, when interacting with a UI - you can send
>> your queries to the query operator and listen for response from the query
>> response operator. Historically we have used json over websockets to
>> interact from browser. How does your UI interact with your server today?
>>
>> You dont have to create a new DAG for each cell you are changing. You can
>> have a single DAG running and send across your query with the cell changes
>> in the schema you define. You can perform all corresponding changes for
>> other cells/table rows in the store operator.
>>
>> If you still want to depend completely on your existing server for
>> loading initial data, then you can load it to a cache in store and do your
>> analysis on that data in memory.
>>
>> Regards,
>> Ashwin.
>>
>> On Wed, Jan 27, 2016 at 7:42 AM, Amol Kekre <am...@datatorrent.com> wrote:
>>
>>>
>>> Amit,
>>> Here are some answers
>>> - Logic that you want to run can be coded as an utility, that is then
>>> invoked by any other operator
>>> - PopulateDAG() is today part of roll out of the app, i.e it is similar
>>> to "compileTime" and not "runTime". You could do runTime, but then you will
>>> need to go through dtcli. Today runTime changes via dtcli will need a lot
>>> more coding. A very early version of runTime changes (based on system
>>> metrics) exist, but the ask is for changes based on application data. That
>>> ask is in the roadmap of module rollout (phase II?) and others can comment
>>> on the roadmap for runtTime populateDAG.
>>> - Outputs of many operators can be streamed as input to one operator in
>>> following ways
>>>    - Each output having different schema will mean different input ports
>>> on that operator as port schema is fixed. This is fine, but will clutter
>>> the DAG
>>>    - If the schema of these output ports is same, there is a merge
>>> operator that does that (
>>> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java).
>>> You can write one for Nx1 merge by extending the above class.
>>>
>>> Thks,
>>> Amol
>>>
>>>
>>> On Wed, Jan 27, 2016 at 6:03 AM, Amit Shah <am...@gmail.com> wrote:
>>>
>>>> Thanks Ashwin for the follow up.
>>>> I am not sure if I completely follow the query -> store -> query
>>>> pattern. What does query mean here? Why would we need a in-memory store?
>>>> Trying to list down the flow I came up with below points
>>>>
>>>>    1. We need to build a DAG after we get to know the cell (table, row
>>>>    and column index) that is modified by the user.
>>>>    2. Once we receive user input (i.e. once the user modifies a value
>>>>    in a table) the populateDAG() method should be called.
>>>>    3. The populateDAG() implementation would
>>>>    1. Determine what cells should be updated across all tables
>>>>       2. Create an Operator per cell that is affected by the change.
>>>>       From the demo code I see dag.addOperator method instantiating an
>>>>       operator. Since the logic to update an cell would be the same
>>>>       across tables how do we create new operators per cell to have a graph that
>>>>       looks what Bhupesh envisioned in his last email reply? In my
>>>>       view the graph would like
>>>>
>>>>                     O1 (for user modified cell) -> O2 (table X, row Y,
>>>> column index 2) -> O5 (table E, row F, column index 10000)
>>>>                                                                O3
>>>> (table M, row N, column index 3)
>>>>                       ->  O6 (update UI)
>>>>                                                                O4
>>>> (table P, row Q, column index 1)
>>>>
>>>>               3. We want the DAG to be evaluated instantly once the
>>>> populateDAG() method finishes. How do we do it?
>>>>               4. Can outputs from many operators be streamed as an
>>>> input to one operator? From the above example outputs from O3, O4 and
>>>> O5 need to go to O6.
>>>>
>>>> I appreciate your inputs on this.
>>>>
>>>> Thanks,
>>>> Amit.
>>>>
>>>>
>>>> On Wed, Jan 27, 2016 at 1:49 PM, Ashwin Chandra Putta <
>>>> ashwinchandrap@gmail.com> wrote:
>>>>
>>>>> Amit,
>>>>>
>>>>> Thanks for the response. You can use the query --> store --> query
>>>>> result pattern to do the real time updates and lookups for what-if analysis.
>>>>>
>>>>> And you can also ingest your real time input data to the store
>>>>> operator. input --> store.
>>>>>
>>>>> That way, you can keep ingesting your data into the store operator
>>>>> where you will keep your OLAP dimensions and measures.
>>>>>
>>>>> For the query/query result pattern example, see this demo:
>>>>>
>>>>>
>>>>> https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java
>>>>>
>>>>> Regards,
>>>>> Ashwin.
>>>>>
>>>>> On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <am...@gmail.com> wrote:
>>>>>
>>>>>> Appreciate the discussion we are having on this topic.
>>>>>>
>>>>>> Bhupesh, If I understand the flow correctly, we would have to define
>>>>>> one DAG per cell in the table that could be modified by the user. Given
>>>>>> this, it would be right to define the DAG only when the table is presented
>>>>>> to the user on the UI (not at definition time since there would be many
>>>>>> tables). Would it be possible to define DAG at runtime i.e. defining &
>>>>>> wiring the operators at runtime?
>>>>>>
>>>>>>
>>>>>> Ashwin, I am glad to answer these questions
>>>>>>
>>>>>> 1. We are extending our OLTP based application by introducing
>>>>>> analytical features that includes what-if kind of analysis. Other features
>>>>>> do include performing OLAP kind of operations like aggregation, slice &
>>>>>> dice, drill down/up, pivoting. Our first milestone is to target what-if
>>>>>> kind of analysis. We don't have any implementation so far. We are exploring
>>>>>> out solutions to these requirements
>>>>>> 2. The technical challenges we have include having an in-memory
>>>>>> calculation engine system that supports parallel writes and provides real
>>>>>> time or near real time response time.
>>>>>>
>>>>>> Hope that answers your queries.
>>>>>>
>>>>>> Thanks,
>>>>>> Amit.
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
>>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>>
>>>>>>> Amit,
>>>>>>>
>>>>>>> I have a couple of questions if its not much.
>>>>>>>
>>>>>>> 1. What is the current implementation?
>>>>>>> 2. What are the challenges you are facing?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ashwin.
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am trying to evaluate apache apex for building an application
>>>>>>> that supports what-if analysis support to users. This co-relates closed
>>>>>>> with excel kind of functionality where changing a value in one cell
>>>>>>> triggers changes in other cell values. In our case we would have multiple
>>>>>>> rows in various tables getting updated when the user changes a row value.
>>>>>>> The response needs to be in real-time or near real-time.
>>>>>>>
>>>>>>> Does Apex fit such an use-case? If so, what would be some of
>>>>>>> initial steps to evaluate it for this use case?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Regards,
>>>>> Ashwin.
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> Ashwin.
>>
>
>

Re: What-if analysis with apex

Posted by Amit Shah <am...@gmail.com>.

Ashwin, Below are follow up queries that I have based on your response.

The store I mentioned is just an abstraction. It can be in memory store, or
> a cache backed lookup from a database.


Yes I understand by the term store but I didn't follow the need of it.

How does your UI interact with your server today?


Our UI is built over angularjs so it communicates with the server through
REST api's.

You dont have to create a new DAG for each cell you are changing. You can
> have a single DAG running and send across your query with the cell changes
> in the schema you define. You can perform all corresponding changes for
> other cells/table rows in the store operator.


I was under the impression that by defining one operator per column index I
could take the advantage of apex running individual operators on individual
jvm's and hence parallel writes in real-time or near real-time response
time. If we have single static DAG that accepts the cell identiifer (row
Id, column index and table id) as parameters then we would not be able to
concurrently updates cell values right?
If your understanding is different from the flow I explained in my previous
mail, what do I gain by using apex?


Thanks,
Amit.


On Thu, Jan 28, 2016 at 12:51 AM, Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> Amit,
>
> The store I mentioned is just an abstraction. It can be in memory store,
> or a cache backed lookup from a database.
>
> For the query/query response, when interacting with a UI - you can send
> your queries to the query operator and listen for response from the query
> response operator. Historically we have used json over websockets to
> interact from browser. How does your UI interact with your server today?
>
> You dont have to create a new DAG for each cell you are changing. You can
> have a single DAG running and send across your query with the cell changes
> in the schema you define. You can perform all corresponding changes for
> other cells/table rows in the store operator.
>
> If you still want to depend completely on your existing server for loading
> initial data, then you can load it to a cache in store and do your analysis
> on that data in memory.
>
> Regards,
> Ashwin.
>
> On Wed, Jan 27, 2016 at 7:42 AM, Amol Kekre <am...@datatorrent.com> wrote:
>
>>
>> Amit,
>> Here are some answers
>> - Logic that you want to run can be coded as an utility, that is then
>> invoked by any other operator
>> - PopulateDAG() is today part of roll out of the app, i.e it is similar
>> to "compileTime" and not "runTime". You could do runTime, but then you will
>> need to go through dtcli. Today runTime changes via dtcli will need a lot
>> more coding. A very early version of runTime changes (based on system
>> metrics) exist, but the ask is for changes based on application data. That
>> ask is in the roadmap of module rollout (phase II?) and others can comment
>> on the roadmap for runtTime populateDAG.
>> - Outputs of many operators can be streamed as input to one operator in
>> following ways
>>    - Each output having different schema will mean different input ports
>> on that operator as port schema is fixed. This is fine, but will clutter
>> the DAG
>>    - If the schema of these output ports is same, there is a merge
>> operator that does that (
>> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java).
>> You can write one for Nx1 merge by extending the above class.
>>
>> Thks,
>> Amol
>>
>>
>> On Wed, Jan 27, 2016 at 6:03 AM, Amit Shah <am...@gmail.com> wrote:
>>
>>> Thanks Ashwin for the follow up.
>>> I am not sure if I completely follow the query -> store -> query
>>> pattern. What does query mean here? Why would we need a in-memory store?
>>> Trying to list down the flow I came up with below points
>>>
>>>    1. We need to build a DAG after we get to know the cell (table, row
>>>    and column index) that is modified by the user.
>>>    2. Once we receive user input (i.e. once the user modifies a value
>>>    in a table) the populateDAG() method should be called.
>>>    3. The populateDAG() implementation would
>>>    1. Determine what cells should be updated across all tables
>>>       2. Create an Operator per cell that is affected by the change.
>>>       From the demo code I see dag.addOperator method instantiating an
>>>       operator. Since the logic to update an cell would be the same
>>>       across tables how do we create new operators per cell to have a graph that
>>>       looks what Bhupesh envisioned in his last email reply? In my view
>>>       the graph would like
>>>
>>>                     O1 (for user modified cell) -> O2 (table X, row Y,
>>> column index 2) -> O5 (table E, row F, column index 10000)
>>>                                                                O3
>>> (table M, row N, column index 3)
>>>                       ->  O6 (update UI)
>>>                                                                O4
>>> (table P, row Q, column index 1)
>>>
>>>               3. We want the DAG to be evaluated instantly once the
>>> populateDAG() method finishes. How do we do it?
>>>               4. Can outputs from many operators be streamed as an
>>> input to one operator? From the above example outputs from O3, O4 and
>>> O5 need to go to O6.
>>>
>>> I appreciate your inputs on this.
>>>
>>> Thanks,
>>> Amit.
>>>
>>>
>>> On Wed, Jan 27, 2016 at 1:49 PM, Ashwin Chandra Putta <
>>> ashwinchandrap@gmail.com> wrote:
>>>
>>>> Amit,
>>>>
>>>> Thanks for the response. You can use the query --> store --> query
>>>> result pattern to do the real time updates and lookups for what-if analysis.
>>>>
>>>> And you can also ingest your real time input data to the store
>>>> operator. input --> store.
>>>>
>>>> That way, you can keep ingesting your data into the store operator
>>>> where you will keep your OLAP dimensions and measures.
>>>>
>>>> For the query/query result pattern example, see this demo:
>>>>
>>>>
>>>> https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java
>>>>
>>>> Regards,
>>>> Ashwin.
>>>>
>>>> On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <am...@gmail.com> wrote:
>>>>
>>>>> Appreciate the discussion we are having on this topic.
>>>>>
>>>>> Bhupesh, If I understand the flow correctly, we would have to define
>>>>> one DAG per cell in the table that could be modified by the user. Given
>>>>> this, it would be right to define the DAG only when the table is presented
>>>>> to the user on the UI (not at definition time since there would be many
>>>>> tables). Would it be possible to define DAG at runtime i.e. defining &
>>>>> wiring the operators at runtime?
>>>>>
>>>>>
>>>>> Ashwin, I am glad to answer these questions
>>>>>
>>>>> 1. We are extending our OLTP based application by introducing
>>>>> analytical features that includes what-if kind of analysis. Other features
>>>>> do include performing OLAP kind of operations like aggregation, slice &
>>>>> dice, drill down/up, pivoting. Our first milestone is to target what-if
>>>>> kind of analysis. We don't have any implementation so far. We are exploring
>>>>> out solutions to these requirements
>>>>> 2. The technical challenges we have include having an in-memory
>>>>> calculation engine system that supports parallel writes and provides real
>>>>> time or near real time response time.
>>>>>
>>>>> Hope that answers your queries.
>>>>>
>>>>> Thanks,
>>>>> Amit.
>>>>>
>>>>>
>>>>> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
>>>>> ashwinchandrap@gmail.com> wrote:
>>>>>
>>>>>> Amit,
>>>>>>
>>>>>> I have a couple of questions if its not much.
>>>>>>
>>>>>> 1. What is the current implementation?
>>>>>> 2. What are the challenges you are facing?
>>>>>>
>>>>>> Regards,
>>>>>> Ashwin.
>>>>>> Hello,
>>>>>>
>>>>>> I am trying to evaluate apache apex for building an application that
>>>>>> supports what-if analysis support to users. This co-relates closed with
>>>>>> excel kind of functionality where changing a value in one cell triggers
>>>>>> changes in other cell values. In our case we would have multiple rows in
>>>>>> various tables getting updated when the user changes a row value. The
>>>>>> response needs to be in real-time or near real-time.
>>>>>>
>>>>>> Does Apex fit such an use-case? If so, what would be some of initial
>>>>>> steps to evaluate it for this use case?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> Ashwin.
>>>>
>>>
>>>
>>
>
>
> --
>
> Regards,
> Ashwin.
>

Re: What-if analysis with apex

Posted by Ashwin Chandra Putta <as...@gmail.com>.

Amit,

The store I mentioned is just an abstraction. It can be in memory store, or
a cache backed lookup from a database.

For the query/query response, when interacting with a UI - you can send
your queries to the query operator and listen for response from the query
response operator. Historically we have used json over websockets to
interact from browser. How does your UI interact with your server today?

You dont have to create a new DAG for each cell you are changing. You can
have a single DAG running and send across your query with the cell changes
in the schema you define. You can perform all corresponding changes for
other cells/table rows in the store operator.

If you still want to depend completely on your existing server for loading
initial data, then you can load it to a cache in store and do your analysis
on that data in memory.

Regards,
Ashwin.

On Wed, Jan 27, 2016 at 7:42 AM, Amol Kekre <am...@datatorrent.com> wrote:

>
> Amit,
> Here are some answers
> - Logic that you want to run can be coded as an utility, that is then
> invoked by any other operator
> - PopulateDAG() is today part of roll out of the app, i.e it is similar to
> "compileTime" and not "runTime". You could do runTime, but then you will
> need to go through dtcli. Today runTime changes via dtcli will need a lot
> more coding. A very early version of runTime changes (based on system
> metrics) exist, but the ask is for changes based on application data. That
> ask is in the roadmap of module rollout (phase II?) and others can comment
> on the roadmap for runtTime populateDAG.
> - Outputs of many operators can be streamed as input to one operator in
> following ways
>    - Each output having different schema will mean different input ports
> on that operator as port schema is fixed. This is fine, but will clutter
> the DAG
>    - If the schema of these output ports is same, there is a merge
> operator that does that (
> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java).
> You can write one for Nx1 merge by extending the above class.
>
> Thks,
> Amol
>
>
> On Wed, Jan 27, 2016 at 6:03 AM, Amit Shah <am...@gmail.com> wrote:
>
>> Thanks Ashwin for the follow up.
>> I am not sure if I completely follow the query -> store -> query pattern.
>> What does query mean here? Why would we need a in-memory store?
>> Trying to list down the flow I came up with below points
>>
>>    1. We need to build a DAG after we get to know the cell (table, row
>>    and column index) that is modified by the user.
>>    2. Once we receive user input (i.e. once the user modifies a value in
>>    a table) the populateDAG() method should be called.
>>    3. The populateDAG() implementation would
>>    1. Determine what cells should be updated across all tables
>>       2. Create an Operator per cell that is affected by the change.
>>       From the demo code I see dag.addOperator method instantiating an
>>       operator. Since the logic to update an cell would be the same
>>       across tables how do we create new operators per cell to have a graph that
>>       looks what Bhupesh envisioned in his last email reply? In my view
>>       the graph would like
>>
>>                     O1 (for user modified cell) -> O2 (table X, row Y,
>> column index 2) -> O5 (table E, row F, column index 10000)
>>                                                                O3 (table
>> M, row N, column index 3)
>>               ->  O6 (update UI)
>>                                                                O4 (table
>> P, row Q, column index 1)
>>
>>               3. We want the DAG to be evaluated instantly once the
>> populateDAG() method finishes. How do we do it?
>>               4. Can outputs from many operators be streamed as an input
>> to one operator? From the above example outputs from O3, O4 and O5 need
>> to go to O6.
>>
>> I appreciate your inputs on this.
>>
>> Thanks,
>> Amit.
>>
>>
>> On Wed, Jan 27, 2016 at 1:49 PM, Ashwin Chandra Putta <
>> ashwinchandrap@gmail.com> wrote:
>>
>>> Amit,
>>>
>>> Thanks for the response. You can use the query --> store --> query
>>> result pattern to do the real time updates and lookups for what-if analysis.
>>>
>>> And you can also ingest your real time input data to the store operator.
>>> input --> store.
>>>
>>> That way, you can keep ingesting your data into the store operator where
>>> you will keep your OLAP dimensions and measures.
>>>
>>> For the query/query result pattern example, see this demo:
>>>
>>>
>>> https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java
>>>
>>> Regards,
>>> Ashwin.
>>>
>>> On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <am...@gmail.com> wrote:
>>>
>>>> Appreciate the discussion we are having on this topic.
>>>>
>>>> Bhupesh, If I understand the flow correctly, we would have to define
>>>> one DAG per cell in the table that could be modified by the user. Given
>>>> this, it would be right to define the DAG only when the table is presented
>>>> to the user on the UI (not at definition time since there would be many
>>>> tables). Would it be possible to define DAG at runtime i.e. defining &
>>>> wiring the operators at runtime?
>>>>
>>>>
>>>> Ashwin, I am glad to answer these questions
>>>>
>>>> 1. We are extending our OLTP based application by introducing
>>>> analytical features that includes what-if kind of analysis. Other features
>>>> do include performing OLAP kind of operations like aggregation, slice &
>>>> dice, drill down/up, pivoting. Our first milestone is to target what-if
>>>> kind of analysis. We don't have any implementation so far. We are exploring
>>>> out solutions to these requirements
>>>> 2. The technical challenges we have include having an in-memory
>>>> calculation engine system that supports parallel writes and provides real
>>>> time or near real time response time.
>>>>
>>>> Hope that answers your queries.
>>>>
>>>> Thanks,
>>>> Amit.
>>>>
>>>>
>>>> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
>>>> ashwinchandrap@gmail.com> wrote:
>>>>
>>>>> Amit,
>>>>>
>>>>> I have a couple of questions if its not much.
>>>>>
>>>>> 1. What is the current implementation?
>>>>> 2. What are the challenges you are facing?
>>>>>
>>>>> Regards,
>>>>> Ashwin.
>>>>> Hello,
>>>>>
>>>>> I am trying to evaluate apache apex for building an application that
>>>>> supports what-if analysis support to users. This co-relates closed with
>>>>> excel kind of functionality where changing a value in one cell triggers
>>>>> changes in other cell values. In our case we would have multiple rows in
>>>>> various tables getting updated when the user changes a row value. The
>>>>> response needs to be in real-time or near real-time.
>>>>>
>>>>> Does Apex fit such an use-case? If so, what would be some of initial
>>>>> steps to evaluate it for this use case?
>>>>>
>>>>> Thanks!
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Regards,
>>> Ashwin.
>>>
>>
>>
>


-- 

Regards,
Ashwin.

Re: What-if analysis with apex

Posted by Amol Kekre <am...@datatorrent.com>.

Amit,
Here are some answers
- Logic that you want to run can be coded as an utility, that is then
invoked by any other operator
- PopulateDAG() is today part of roll out of the app, i.e it is similar to
"compileTime" and not "runTime". You could do runTime, but then you will
need to go through dtcli. Today runTime changes via dtcli will need a lot
more coding. A very early version of runTime changes (based on system
metrics) exist, but the ask is for changes based on application data. That
ask is in the roadmap of module rollout (phase II?) and others can comment
on the roadmap for runtTime populateDAG.
- Outputs of many operators can be streamed as input to one operator in
following ways
   - Each output having different schema will mean different input ports on
that operator as port schema is fixed. This is fine, but will clutter the
DAG
   - If the schema of these output ports is same, there is a merge operator
that does that (
https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/stream/StreamMerger.java).
You can write one for Nx1 merge by extending the above class.

Thks,
Amol


On Wed, Jan 27, 2016 at 6:03 AM, Amit Shah <am...@gmail.com> wrote:

> Thanks Ashwin for the follow up.
> I am not sure if I completely follow the query -> store -> query pattern.
> What does query mean here? Why would we need a in-memory store?
> Trying to list down the flow I came up with below points
>
>    1. We need to build a DAG after we get to know the cell (table, row
>    and column index) that is modified by the user.
>    2. Once we receive user input (i.e. once the user modifies a value in
>    a table) the populateDAG() method should be called.
>    3. The populateDAG() implementation would
>    1. Determine what cells should be updated across all tables
>       2. Create an Operator per cell that is affected by the change. From
>       the demo code I see dag.addOperator method instantiating an
>       operator. Since the logic to update an cell would be the same
>       across tables how do we create new operators per cell to have a graph that
>       looks what Bhupesh envisioned in his last email reply? In my view
>       the graph would like
>
>                     O1 (for user modified cell) -> O2 (table X, row Y,
> column index 2) -> O5 (table E, row F, column index 10000)
>                                                                O3 (table
> M, row N, column index 3)
>               ->  O6 (update UI)
>                                                                O4 (table
> P, row Q, column index 1)
>
>               3. We want the DAG to be evaluated instantly once the
> populateDAG() method finishes. How do we do it?
>               4. Can outputs from many operators be streamed as an input
> to one operator? From the above example outputs from O3, O4 and O5 need
> to go to O6.
>
> I appreciate your inputs on this.
>
> Thanks,
> Amit.
>
>
> On Wed, Jan 27, 2016 at 1:49 PM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>
>> Amit,
>>
>> Thanks for the response. You can use the query --> store --> query result
>> pattern to do the real time updates and lookups for what-if analysis.
>>
>> And you can also ingest your real time input data to the store operator.
>> input --> store.
>>
>> That way, you can keep ingesting your data into the store operator where
>> you will keep your OLAP dimensions and measures.
>>
>> For the query/query result pattern example, see this demo:
>>
>>
>> https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java
>>
>> Regards,
>> Ashwin.
>>
>> On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <am...@gmail.com> wrote:
>>
>>> Appreciate the discussion we are having on this topic.
>>>
>>> Bhupesh, If I understand the flow correctly, we would have to define one
>>> DAG per cell in the table that could be modified by the user. Given this,
>>> it would be right to define the DAG only when the table is presented to the
>>> user on the UI (not at definition time since there would be many tables).
>>> Would it be possible to define DAG at runtime i.e. defining & wiring the
>>> operators at runtime?
>>>
>>>
>>> Ashwin, I am glad to answer these questions
>>>
>>> 1. We are extending our OLTP based application by introducing analytical
>>> features that includes what-if kind of analysis. Other features do include
>>> performing OLAP kind of operations like aggregation, slice & dice, drill
>>> down/up, pivoting. Our first milestone is to target what-if kind of
>>> analysis. We don't have any implementation so far. We are exploring out
>>> solutions to these requirements
>>> 2. The technical challenges we have include having an in-memory
>>> calculation engine system that supports parallel writes and provides real
>>> time or near real time response time.
>>>
>>> Hope that answers your queries.
>>>
>>> Thanks,
>>> Amit.
>>>
>>>
>>> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
>>> ashwinchandrap@gmail.com> wrote:
>>>
>>>> Amit,
>>>>
>>>> I have a couple of questions if its not much.
>>>>
>>>> 1. What is the current implementation?
>>>> 2. What are the challenges you are facing?
>>>>
>>>> Regards,
>>>> Ashwin.
>>>> Hello,
>>>>
>>>> I am trying to evaluate apache apex for building an application that
>>>> supports what-if analysis support to users. This co-relates closed with
>>>> excel kind of functionality where changing a value in one cell triggers
>>>> changes in other cell values. In our case we would have multiple rows in
>>>> various tables getting updated when the user changes a row value. The
>>>> response needs to be in real-time or near real-time.
>>>>
>>>> Does Apex fit such an use-case? If so, what would be some of initial
>>>> steps to evaluate it for this use case?
>>>>
>>>> Thanks!
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> Ashwin.
>>
>
>

Re: What-if analysis with apex

Posted by Amit Shah <am...@gmail.com>.

Thanks Ashwin for the follow up.
I am not sure if I completely follow the query -> store -> query pattern.
What does query mean here? Why would we need a in-memory store?
Trying to list down the flow I came up with below points

   1. We need to build a DAG after we get to know the cell (table, row and
   column index) that is modified by the user.
   2. Once we receive user input (i.e. once the user modifies a value in a
   table) the populateDAG() method should be called.
   3. The populateDAG() implementation would
   1. Determine what cells should be updated across all tables
      2. Create an Operator per cell that is affected by the change. From
      the demo code I see dag.addOperator method instantiating an
operator. Since
      the logic to update an cell would be the same across tables how do we
      create new operators per cell to have a graph that looks what Bhupesh
      envisioned in his last email reply? In my view the graph would like

                    O1 (for user modified cell) -> O2 (table X, row Y,
column index 2) -> O5 (table E, row F, column index 10000)
                                                               O3 (table M,
row N, column index 3)
            ->  O6 (update UI)
                                                               O4 (table P,
row Q, column index 1)

              3. We want the DAG to be evaluated instantly once the
populateDAG() method finishes. How do we do it?
              4. Can outputs from many operators be streamed as an input to
one operator? From the above example outputs from O3, O4 and O5 need to go
to O6.

I appreciate your inputs on this.

Thanks,
Amit.


On Wed, Jan 27, 2016 at 1:49 PM, Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> Amit,
>
> Thanks for the response. You can use the query --> store --> query result
> pattern to do the real time updates and lookups for what-if analysis.
>
> And you can also ingest your real time input data to the store operator.
> input --> store.
>
> That way, you can keep ingesting your data into the store operator where
> you will keep your OLAP dimensions and measures.
>
> For the query/query result pattern example, see this demo:
>
>
> https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java
>
> Regards,
> Ashwin.
>
> On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <am...@gmail.com> wrote:
>
>> Appreciate the discussion we are having on this topic.
>>
>> Bhupesh, If I understand the flow correctly, we would have to define one
>> DAG per cell in the table that could be modified by the user. Given this,
>> it would be right to define the DAG only when the table is presented to the
>> user on the UI (not at definition time since there would be many tables).
>> Would it be possible to define DAG at runtime i.e. defining & wiring the
>> operators at runtime?
>>
>>
>> Ashwin, I am glad to answer these questions
>>
>> 1. We are extending our OLTP based application by introducing analytical
>> features that includes what-if kind of analysis. Other features do include
>> performing OLAP kind of operations like aggregation, slice & dice, drill
>> down/up, pivoting. Our first milestone is to target what-if kind of
>> analysis. We don't have any implementation so far. We are exploring out
>> solutions to these requirements
>> 2. The technical challenges we have include having an in-memory
>> calculation engine system that supports parallel writes and provides real
>> time or near real time response time.
>>
>> Hope that answers your queries.
>>
>> Thanks,
>> Amit.
>>
>>
>> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
>> ashwinchandrap@gmail.com> wrote:
>>
>>> Amit,
>>>
>>> I have a couple of questions if its not much.
>>>
>>> 1. What is the current implementation?
>>> 2. What are the challenges you are facing?
>>>
>>> Regards,
>>> Ashwin.
>>> Hello,
>>>
>>> I am trying to evaluate apache apex for building an application that
>>> supports what-if analysis support to users. This co-relates closed with
>>> excel kind of functionality where changing a value in one cell triggers
>>> changes in other cell values. In our case we would have multiple rows in
>>> various tables getting updated when the user changes a row value. The
>>> response needs to be in real-time or near real-time.
>>>
>>> Does Apex fit such an use-case? If so, what would be some of initial
>>> steps to evaluate it for this use case?
>>>
>>> Thanks!
>>>
>>
>>
>
>
> --
>
> Regards,
> Ashwin.
>

Re: What-if analysis with apex

Posted by Ashwin Chandra Putta <as...@gmail.com>.

Amit,

Thanks for the response. You can use the query --> store --> query result
pattern to do the real time updates and lookups for what-if analysis.

And you can also ingest your real time input data to the store operator.
input --> store.

That way, you can keep ingesting your data into the store operator where
you will keep your OLAP dimensions and measures.

For the query/query result pattern example, see this demo:

https://github.com/apache/incubator-apex-malhar/blob/master/demos/mobile/src/main/java/com/datatorrent/demos/mobile/Application.java

Regards,
Ashwin.

On Tue, Jan 26, 2016 at 9:52 PM, Amit Shah <am...@gmail.com> wrote:

> Appreciate the discussion we are having on this topic.
>
> Bhupesh, If I understand the flow correctly, we would have to define one
> DAG per cell in the table that could be modified by the user. Given this,
> it would be right to define the DAG only when the table is presented to the
> user on the UI (not at definition time since there would be many tables).
> Would it be possible to define DAG at runtime i.e. defining & wiring the
> operators at runtime?
>
>
> Ashwin, I am glad to answer these questions
>
> 1. We are extending our OLTP based application by introducing analytical
> features that includes what-if kind of analysis. Other features do include
> performing OLAP kind of operations like aggregation, slice & dice, drill
> down/up, pivoting. Our first milestone is to target what-if kind of
> analysis. We don't have any implementation so far. We are exploring out
> solutions to these requirements
> 2. The technical challenges we have include having an in-memory
> calculation engine system that supports parallel writes and provides real
> time or near real time response time.
>
> Hope that answers your queries.
>
> Thanks,
> Amit.
>
>
> On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>
>> Amit,
>>
>> I have a couple of questions if its not much.
>>
>> 1. What is the current implementation?
>> 2. What are the challenges you are facing?
>>
>> Regards,
>> Ashwin.
>> Hello,
>>
>> I am trying to evaluate apache apex for building an application that
>> supports what-if analysis support to users. This co-relates closed with
>> excel kind of functionality where changing a value in one cell triggers
>> changes in other cell values. In our case we would have multiple rows in
>> various tables getting updated when the user changes a row value. The
>> response needs to be in real-time or near real-time.
>>
>> Does Apex fit such an use-case? If so, what would be some of initial
>> steps to evaluate it for this use case?
>>
>> Thanks!
>>
>
>


-- 

Regards,
Ashwin.

Re: What-if analysis with apex

Posted by Amit Shah <am...@gmail.com>.

Appreciate the discussion we are having on this topic.

Bhupesh, If I understand the flow correctly, we would have to define one
DAG per cell in the table that could be modified by the user. Given this,
it would be right to define the DAG only when the table is presented to the
user on the UI (not at definition time since there would be many tables).
Would it be possible to define DAG at runtime i.e. defining & wiring the
operators at runtime?

Ashwin, I am glad to answer these questions

1. We are extending our OLTP based application by introducing analytical
features that includes what-if kind of analysis. Other features do include
performing OLAP kind of operations like aggregation, slice & dice, drill
down/up, pivoting. Our first milestone is to target what-if kind of
analysis. We don't have any implementation so far. We are exploring out
solutions to these requirements
2. The technical challenges we have include having an in-memory calculation
engine system that supports parallel writes and provides real time or near
real time response time.

Hope that answers your queries.

Thanks,
Amit.

On Mon, Jan 25, 2016 at 10:26 PM, Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> Amit,
>
> I have a couple of questions if its not much.
>
> 1. What is the current implementation?
> 2. What are the challenges you are facing?
>
> Regards,
> Ashwin.
> Hello,
>
> I am trying to evaluate apache apex for building an application that
> supports what-if analysis support to users. This co-relates closed with
> excel kind of functionality where changing a value in one cell triggers
> changes in other cell values. In our case we would have multiple rows in
> various tables getting updated when the user changes a row value. The
> response needs to be in real-time or near real-time.
>
> Does Apex fit such an use-case? If so, what would be some of initial
> steps to evaluate it for this use case?
>
> Thanks!
>

Re: What-if analysis with apex

Posted by Ashwin Chandra Putta <as...@gmail.com>.

Amit,

I have a couple of questions if its not much.

1. What is the current implementation?
2. What are the challenges you are facing?

Regards,
Ashwin.
Hello,

I am trying to evaluate apache apex for building an application that
supports what-if analysis support to users. This co-relates closed with
excel kind of functionality where changing a value in one cell triggers
changes in other cell values. In our case we would have multiple rows in
various tables getting updated when the user changes a row value. The
response needs to be in real-time or near real-time.

Does Apex fit such an use-case? If so, what would be some of initial steps
to evaluate it for this use case?

Thanks!