You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@asterixdb.apache.org by Steven Jacobs <sj...@ucr.edu> on 2017/11/16 20:43:10 UTC

MultiTransactionJobletEventListenerFactory

Hi all,
We currently have MultiTransactionJobletEventListenerFactory, which allows
for one Hyracks job to run multiple Asterix transactions together.

This class is only used by feeds, and feeds are in process of changing to
no longer need this feature. As part of the work in pre-deploying job
specifications to be used by multiple hyracks jobs, I've been working on
removing the transaction id from the job specifications, as we use a new
transaction for each invocation of a deployed job.

There is currently no clear way to remove the transaction id from the job
spec and keep the option for MultiTransactionJobletEventListenerFactory.

The question for the group is, do we see a need to maintain this class that
will no longer be used by any current code? Or, an other words, is there a
strong possibility that in the future we will want multiple transactions to
share a single Hyracks job, meaning that it is worth figuring out how to
maintain this class?

Steven

Re: MultiTransactionJobletEventListenerFactory

Posted by abdullah alamoudi <ba...@gmail.com>.

Keep in mind that one option looks up the map once per job while the other looks it up once per record.

Cheers,
Abdullah.

> On Nov 17, 2017, at 2:23 PM, Xikui Wang <xi...@uci.edu> wrote:
> 
> If I understand Abdullah's proposal correctly, for option 1, you can create
> a dataset id to transaction id map in the
> MultiTransactionJobletEventListener. When committing, the commit runtime
> can take the dataset-id to ask for the transaction-id and commit the
> sub-transaction. Here we are putting up an assumption that there will not
> be a feed connected to a dataset twice in a feed job. This is a fair
> assumption in most cases.
> 
> But, this doesn't really solve the problem, right? IMHO, if we all agree
> that there is a one-to-one mapping from transaction to Hyracks job, we
> probably should use a single transaction id for the combined job, i.e.,
> option 2... As Murtadha suggested, we can now register multiple resources
> with the transaction context with the patch he merged yesterday (it took me
> some time to catch up on the transaction codebase so sorry for joining
> late.). I think this can offer us a nice and clean solution.
> 
> Best,
> Xikui
> 
> On Fri, Nov 17, 2017 at 11:58 AM, Steven Jacobs <sj...@ucr.edu> wrote:
> 
>> If that's true than that solution seems best to me, but we had discussed
>> this earlier and Xikui mentioned that that might not be true.
>> @Xikui?
>> Steven
>> 
>> On Fri, Nov 17, 2017 at 11:55 AM, abdullah alamoudi <ba...@gmail.com>
>> wrote:
>> 
>>> Right now, they can't, so datasetId can be safely used.
>>>> On Nov 17, 2017, at 11:51 AM, Steven Jacobs <sj...@ucr.edu> wrote:
>>>> 
>>>> For option 1, I think the dataset id is not a unique identifier.
>> Couldn't
>>>> multiple transactions in one job work on the same dataset?
>>>> 
>>>> Steven
>>>> 
>>>> On Fri, Nov 17, 2017 at 11:38 AM, abdullah alamoudi <
>> bamousaa@gmail.com>
>>>> wrote:
>>>> 
>>>>> So, there are three options to do this:
>>>>> 1. Each of these operators work on a a specific dataset. So we can
>> pass
>>>>> the datasetId to the JobEventListenerFactory when requesting the
>>>>> transaction id.
>>>>> 2. We make 1 transaction works for multiple datasets by using a map
>> from
>>>>> datasetId to primary opTracker and use it when reporting commits by
>> the
>>> log
>>>>> flusher thread.
>>>>> 3. Prevent a job from having multiple transactions. (For the record, I
>>>>> dislike this option since the price we pay is very high IMO)
>>>>> 
>>>>> Cheers,
>>>>> Abdullah.
>>>>> 
>>>>>> On Nov 17, 2017, at 11:32 AM, Steven Jacobs <sj...@ucr.edu>
>> wrote:
>>>>>> 
>>>>>> Well, we've solved the problem when there is only one transaction id
>>> per
>>>>>> job. The operators can fetch the transaction ids from the
>>>>>> JobEventListenerFactory (you can find this in master now). The issue
>>> is,
>>>>>> when we are trying to combine multiple job specs into one feed job,
>> the
>>>>>> operators at runtime don't have a memory of which "job spec" they
>>>>>> originally belonged to which could tell them which one of the
>>> transaction
>>>>>> ids that they should use.
>>>>>> 
>>>>>> Steven
>>>>>> 
>>>>>> On Fri, Nov 17, 2017 at 11:25 AM, abdullah alamoudi <
>>> bamousaa@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> I think that this works and seems like the question is how different
>>>>>>> operators in the job can get their transaction ids.
>>>>>>> 
>>>>>>> ~Abdullah.
>>>>>>> 
>>>>>>>> On Nov 17, 2017, at 11:21 AM, Steven Jacobs <sj...@ucr.edu>
>>> wrote:
>>>>>>>> 
>>>>>>>> From the conversation, it seems like nobody has the full picture to
>>>>>>> propose
>>>>>>>> the design?
>>>>>>>> For deployed jobs, the idea is to use the same job specification
>> but
>>>>>>> create
>>>>>>>> a new Hyracks job and Asterix Transaction for each execution.
>>>>>>>> 
>>>>>>>> Steven
>>>>>>>> 
>>>>>>>> On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <
>>>>> bamousaa@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I can e-meet anytime (moved to Sunnyvale). We can also look at a
>>>>>>> proposed
>>>>>>>>> design and see if it can work
>>>>>>>>> Back to my question, how were you planning to change the
>> transaction
>>>>> id
>>>>>>> if
>>>>>>>>> we forget about the case with multiple datasets (feed job)?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sj...@ucr.edu>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Maybe it would be good to have a meeting about this with all
>>>>> interested
>>>>>>>>>> parties?
>>>>>>>>>> 
>>>>>>>>>> I can be on-campus at UCI on Tuesday if that would be a good day
>> to
>>>>>>> meet.
>>>>>>>>>> 
>>>>>>>>>> Steven
>>>>>>>>>> 
>>>>>>>>>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <
>>>>> bamousaa@gmail.com
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Also, was wondering how would you do the same for a single
>> dataset
>>>>>>>>>>> (non-feed). How would you get the transaction id and change it
>>> when
>>>>>>> you
>>>>>>>>>>> re-run?
>>>>>>>>>>> 
>>>>>>>>>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hubailmor@gmail.com
>>> 
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> For atomic transactions, the change was merged yesterday. For
>>>>> entity
>>>>>>>>>>> level
>>>>>>>>>>>> transactions, it should be a very small change.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Murtadha
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <
>>>>> bamousaa@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I understand that is not the case right now but what you're
>>>>> working
>>>>>>>>> on?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Abdullah.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <
>>>>> hubailmor@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> A transaction context can register multiple primary indexes.
>>>>>>>>>>>>>> Since each entity commit log contains the dataset id, you can
>>>>>>>>>>> decrement
>>>>>>>>>>>> the active operations on
>>>>>>>>>>>>>> the operation tracker associated with that dataset id.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <
>>> bamousaa@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Can you illustrate how a deadlock can happen? I am anxious to
>>>>> know.
>>>>>>>>>>>>>> Moreover, the reason for the multiple transaction ids in
>> feeds
>>> is
>>>>>>>>>>> not
>>>>>>>>>>>> simply because we compile them differently.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> How would a commit operator know which dataset active
>> operation
>>>>>>>>>>>> counter to decrement if they share the same id for example?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu>
>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Yes. That deadlock could happen. Currently, we have
>> one-to-one
>>>>>>>>>>>> mappings for
>>>>>>>>>>>>>>> the jobs and transactions, except for the feeds.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> @Abdullah, after some digging into the code, I think
>> probably
>>> we
>>>>>>> can
>>>>>>>>>>>> use a
>>>>>>>>>>>>>>> single transaction id for the job which feeds multiple
>>> datasets?
>>>>>>> See
>>>>>>>>>>>> if I
>>>>>>>>>>>>>>> can convince you. :)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The reason we have multiple transaction ids in feeds is that
>>> we
>>>>>>>>>>> compile
>>>>>>>>>>>>>>> each connection job separately and combine them into a
>> single
>>>>> feed
>>>>>>>>>>>> job. A
>>>>>>>>>>>>>>> new transaction id is created and assigned to each
>> connection
>>>>> job,
>>>>>>>>>>>> thus for
>>>>>>>>>>>>>>> the combined job, we have to handle the different
>> transactions
>>>>> as
>>>>>>>>>>> they
>>>>>>>>>>>>>>> are embedded in the connection job specifications. But, what
>>> if
>>>>> we
>>>>>>>>>>>> create a
>>>>>>>>>>>>>>> single transaction id for the combined job? That transaction
>>> id
>>>>>>> will
>>>>>>>>>>> be
>>>>>>>>>>>>>>> embedded into each connection so they can write logs freely,
>>> but
>>>>>>> the
>>>>>>>>>>>>>>> transaction will be started and committed only once as there
>>> is
>>>>>>> only
>>>>>>>>>>>> one
>>>>>>>>>>>>>>> feed job. In this way, we won't need
>>>>>>> multiTransactionJobletEventLis
>>>>>>>>>>>> tener
>>>>>>>>>>>>>>> and the transaction id can be removed from the job
>>> specification
>>>>>>>>>>>> easily as
>>>>>>>>>>>>>>> well (for Steven's change).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Xikui
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <
>>> dtabass@gmail.com
>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I worry about deadlocks.  The waits for graph may not
>>>>> understand
>>>>>>>>>>> that
>>>>>>>>>>>>>>>> making t1 wait will also make t2 wait since they may share
>> a
>>>>>>> thread
>>>>>>>>>>> -
>>>>>>>>>>>>>>>> right?  Or do we have jobs and transactions separately
>>>>>>> represented
>>>>>>>>>>>> there
>>>>>>>>>>>>>>>> now?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <
>>>>>>> bamousaa@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> We are using multiple transactions in a single job in case
>>> of
>>>>>>> feed
>>>>>>>>>>>> and I
>>>>>>>>>>>>>>>>> think that this is the correct way.
>>>>>>>>>>>>>>>>> Having a single job for a feed that feeds into multiple
>>>>> datasets
>>>>>>>>>>> is a
>>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>>> thing since job resources/feed resources are consolidated.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Here are some points:
>>>>>>>>>>>>>>>>> - We can't use the same transaction id to feed multiple
>>>>>>> datasets.
>>>>>>>>>>> The
>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>> other option is to have multiple jobs each feeding a
>>> different
>>>>>>>>>>>> dataset.
>>>>>>>>>>>>>>>>> - Having multiple jobs (in addition to the extra resources
>>>>> used,
>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>> and CPU) would then forces us to either read data from
>>>>> external
>>>>>>>>>>>> sources
>>>>>>>>>>>>>>>>> multiple times, parse records multiple times, etc
>>>>>>>>>>>>>>>>> or having to have a synchronization between the different
>>> jobs
>>>>>>> and
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> feed source within asterixdb. IMO, this is far more
>>>>> complicated
>>>>>>>>>>> than
>>>>>>>>>>>>>>>> having
>>>>>>>>>>>>>>>>> multiple transactions within a single job and the cost far
>>>>>>>>> outweigh
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> benefits.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> P.S,
>>>>>>>>>>>>>>>>> We are also using this for bucket connections in Couchbase
>>>>>>>>>>> Analytics.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <
>>> tillw@apache.org
>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> If there are a number of issue with supporting multiple
>>>>>>>>>>> transaction
>>>>>>>>>>>> ids
>>>>>>>>>>>>>>>>>> and no clear benefits/use-cases, I’d vote for
>>> simplification
>>>>> :)
>>>>>>>>>>>>>>>>>> Also, code that’s not being used has a tendency to "rot"
>>> and
>>>>>>> so I
>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>> that it’s usefulness might be limited by the time we’d
>>> find a
>>>>>>> use
>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> this functionality.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> My 2c,
>>>>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I'm separating the connections into different jobs in
>> some
>>>>> of
>>>>>>> my
>>>>>>>>>>>>>>>>>>> experiments... but that was intended to be used for the
>>>>>>>>>>>> experimental
>>>>>>>>>>>>>>>>>>> settings (i.e., not for master now)...
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I think the interesting question here is whether we want
>>> to
>>>>>>>>> allow
>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>> Hyracks job to carry multiple transactions. I personally
>>>>> think
>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>> be allowed as the transaction and job are two separate
>>>>>>> concepts,
>>>>>>>>>>>> but I
>>>>>>>>>>>>>>>>>>> couldn't find such use cases other than the feeds. Does
>>>>> anyone
>>>>>>>>>>>> have a
>>>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>>>>> example on this?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Another question is, if we do allow multiple
>> transactions
>>>>> in a
>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain
>> the
>>>>>>>>>>> correct
>>>>>>>>>>>> TXN
>>>>>>>>>>>>>>>>> id
>>>>>>>>>>>>>>>>>>> without having that embedded as part of the job
>>>>> specification.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Xikui
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>>>>>>>>>>>>>>>> bamousaa@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I am curious as to how feed will work without this?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> ~Abdullah.
>>>>>>>>>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <
>>>>>>> sjaco002@ucr.edu
>>>>>>>>>> 
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>> We currently have MultiTransactionJobletEventLis
>>>>>>> tenerFactory,
>>>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>>> allows
>>>>>>>>>>>>>>>>>>>>> for one Hyracks job to run multiple Asterix
>> transactions
>>>>>>>>>>>> together.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> This class is only used by feeds, and feeds are in
>>> process
>>>>>>> of
>>>>>>>>>>>>>>>>> changing to
>>>>>>>>>>>>>>>>>>>>> no longer need this feature. As part of the work in
>>>>>>>>>>> pre-deploying
>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>> specifications to be used by multiple hyracks jobs,
>> I've
>>>>>>> been
>>>>>>>>>>>>>>>> working
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>> removing the transaction id from the job
>> specifications,
>>>>> as
>>>>>>> we
>>>>>>>>>>>> use a
>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>> transaction for each invocation of a deployed job.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> There is currently no clear way to remove the
>>> transaction
>>>>> id
>>>>>>>>>>> from
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>> spec and keep the option for
>>>>> MultiTransactionJobletEventLis
>>>>>>>>>>>>>>>>> tenerFactory.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> The question for the group is, do we see a need to
>>>>> maintain
>>>>>>>>>>> this
>>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> will no longer be used by any current code? Or, an
>> other
>>>>>>>>> words,
>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> strong possibility that in the future we will want
>>>>> multiple
>>>>>>>>>>>>>>>>> transactions
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth
>>>>>>> figuring
>>>>>>>>>>> out
>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> maintain this class?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>>

Re: MultiTransactionJobletEventListenerFactory

Posted by Xikui Wang <xi...@uci.edu>.

If I understand Abdullah's proposal correctly, for option 1, you can create
a dataset id to transaction id map in the
MultiTransactionJobletEventListener. When committing, the commit runtime
can take the dataset-id to ask for the transaction-id and commit the
sub-transaction. Here we are putting up an assumption that there will not
be a feed connected to a dataset twice in a feed job. This is a fair
assumption in most cases.

But, this doesn't really solve the problem, right? IMHO, if we all agree
that there is a one-to-one mapping from transaction to Hyracks job, we
probably should use a single transaction id for the combined job, i.e.,
option 2... As Murtadha suggested, we can now register multiple resources
with the transaction context with the patch he merged yesterday (it took me
some time to catch up on the transaction codebase so sorry for joining
late.). I think this can offer us a nice and clean solution.

Best,
Xikui

On Fri, Nov 17, 2017 at 11:58 AM, Steven Jacobs <sj...@ucr.edu> wrote:

> If that's true than that solution seems best to me, but we had discussed
> this earlier and Xikui mentioned that that might not be true.
> @Xikui?
> Steven
>
> On Fri, Nov 17, 2017 at 11:55 AM, abdullah alamoudi <ba...@gmail.com>
> wrote:
>
> > Right now, they can't, so datasetId can be safely used.
> > > On Nov 17, 2017, at 11:51 AM, Steven Jacobs <sj...@ucr.edu> wrote:
> > >
> > > For option 1, I think the dataset id is not a unique identifier.
> Couldn't
> > > multiple transactions in one job work on the same dataset?
> > >
> > > Steven
> > >
> > > On Fri, Nov 17, 2017 at 11:38 AM, abdullah alamoudi <
> bamousaa@gmail.com>
> > > wrote:
> > >
> > >> So, there are three options to do this:
> > >> 1. Each of these operators work on a a specific dataset. So we can
> pass
> > >> the datasetId to the JobEventListenerFactory when requesting the
> > >> transaction id.
> > >> 2. We make 1 transaction works for multiple datasets by using a map
> from
> > >> datasetId to primary opTracker and use it when reporting commits by
> the
> > log
> > >> flusher thread.
> > >> 3. Prevent a job from having multiple transactions. (For the record, I
> > >> dislike this option since the price we pay is very high IMO)
> > >>
> > >> Cheers,
> > >> Abdullah.
> > >>
> > >>> On Nov 17, 2017, at 11:32 AM, Steven Jacobs <sj...@ucr.edu>
> wrote:
> > >>>
> > >>> Well, we've solved the problem when there is only one transaction id
> > per
> > >>> job. The operators can fetch the transaction ids from the
> > >>> JobEventListenerFactory (you can find this in master now). The issue
> > is,
> > >>> when we are trying to combine multiple job specs into one feed job,
> the
> > >>> operators at runtime don't have a memory of which "job spec" they
> > >>> originally belonged to which could tell them which one of the
> > transaction
> > >>> ids that they should use.
> > >>>
> > >>> Steven
> > >>>
> > >>> On Fri, Nov 17, 2017 at 11:25 AM, abdullah alamoudi <
> > bamousaa@gmail.com>
> > >>> wrote:
> > >>>
> > >>>>
> > >>>> I think that this works and seems like the question is how different
> > >>>> operators in the job can get their transaction ids.
> > >>>>
> > >>>> ~Abdullah.
> > >>>>
> > >>>>> On Nov 17, 2017, at 11:21 AM, Steven Jacobs <sj...@ucr.edu>
> > wrote:
> > >>>>>
> > >>>>> From the conversation, it seems like nobody has the full picture to
> > >>>> propose
> > >>>>> the design?
> > >>>>> For deployed jobs, the idea is to use the same job specification
> but
> > >>>> create
> > >>>>> a new Hyracks job and Asterix Transaction for each execution.
> > >>>>>
> > >>>>> Steven
> > >>>>>
> > >>>>> On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <
> > >> bamousaa@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> I can e-meet anytime (moved to Sunnyvale). We can also look at a
> > >>>> proposed
> > >>>>>> design and see if it can work
> > >>>>>> Back to my question, how were you planning to change the
> transaction
> > >> id
> > >>>> if
> > >>>>>> we forget about the case with multiple datasets (feed job)?
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sj...@ucr.edu>
> > >> wrote:
> > >>>>>>>
> > >>>>>>> Maybe it would be good to have a meeting about this with all
> > >> interested
> > >>>>>>> parties?
> > >>>>>>>
> > >>>>>>> I can be on-campus at UCI on Tuesday if that would be a good day
> to
> > >>>> meet.
> > >>>>>>>
> > >>>>>>> Steven
> > >>>>>>>
> > >>>>>>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <
> > >> bamousaa@gmail.com
> > >>>>>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Also, was wondering how would you do the same for a single
> dataset
> > >>>>>>>> (non-feed). How would you get the transaction id and change it
> > when
> > >>>> you
> > >>>>>>>> re-run?
> > >>>>>>>>
> > >>>>>>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hubailmor@gmail.com
> >
> > >>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> For atomic transactions, the change was merged yesterday. For
> > >> entity
> > >>>>>>>> level
> > >>>>>>>>> transactions, it should be a very small change.
> > >>>>>>>>>
> > >>>>>>>>> Cheers,
> > >>>>>>>>> Murtadha
> > >>>>>>>>>
> > >>>>>>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <
> > >> bamousaa@gmail.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> I understand that is not the case right now but what you're
> > >> working
> > >>>>>> on?
> > >>>>>>>>>>
> > >>>>>>>>>> Cheers,
> > >>>>>>>>>> Abdullah.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <
> > >> hubailmor@gmail.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> A transaction context can register multiple primary indexes.
> > >>>>>>>>>>> Since each entity commit log contains the dataset id, you can
> > >>>>>>>> decrement
> > >>>>>>>>> the active operations on
> > >>>>>>>>>>> the operation tracker associated with that dataset id.
> > >>>>>>>>>>>
> > >>>>>>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <
> > bamousaa@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> Can you illustrate how a deadlock can happen? I am anxious to
> > >> know.
> > >>>>>>>>>>> Moreover, the reason for the multiple transaction ids in
> feeds
> > is
> > >>>>>>>> not
> > >>>>>>>>> simply because we compile them differently.
> > >>>>>>>>>>>
> > >>>>>>>>>>> How would a commit operator know which dataset active
> operation
> > >>>>>>>>> counter to decrement if they share the same id for example?
> > >>>>>>>>>>>
> > >>>>>>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu>
> > wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Yes. That deadlock could happen. Currently, we have
> one-to-one
> > >>>>>>>>> mappings for
> > >>>>>>>>>>>> the jobs and transactions, except for the feeds.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> @Abdullah, after some digging into the code, I think
> probably
> > we
> > >>>> can
> > >>>>>>>>> use a
> > >>>>>>>>>>>> single transaction id for the job which feeds multiple
> > datasets?
> > >>>> See
> > >>>>>>>>> if I
> > >>>>>>>>>>>> can convince you. :)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> The reason we have multiple transaction ids in feeds is that
> > we
> > >>>>>>>> compile
> > >>>>>>>>>>>> each connection job separately and combine them into a
> single
> > >> feed
> > >>>>>>>>> job. A
> > >>>>>>>>>>>> new transaction id is created and assigned to each
> connection
> > >> job,
> > >>>>>>>>> thus for
> > >>>>>>>>>>>> the combined job, we have to handle the different
> transactions
> > >> as
> > >>>>>>>> they
> > >>>>>>>>>>>> are embedded in the connection job specifications. But, what
> > if
> > >> we
> > >>>>>>>>> create a
> > >>>>>>>>>>>> single transaction id for the combined job? That transaction
> > id
> > >>>> will
> > >>>>>>>> be
> > >>>>>>>>>>>> embedded into each connection so they can write logs freely,
> > but
> > >>>> the
> > >>>>>>>>>>>> transaction will be started and committed only once as there
> > is
> > >>>> only
> > >>>>>>>>> one
> > >>>>>>>>>>>> feed job. In this way, we won't need
> > >>>> multiTransactionJobletEventLis
> > >>>>>>>>> tener
> > >>>>>>>>>>>> and the transaction id can be removed from the job
> > specification
> > >>>>>>>>> easily as
> > >>>>>>>>>>>> well (for Steven's change).
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Best,
> > >>>>>>>>>>>> Xikui
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <
> > dtabass@gmail.com
> > >>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I worry about deadlocks.  The waits for graph may not
> > >> understand
> > >>>>>>>> that
> > >>>>>>>>>>>>> making t1 wait will also make t2 wait since they may share
> a
> > >>>> thread
> > >>>>>>>> -
> > >>>>>>>>>>>>> right?  Or do we have jobs and transactions separately
> > >>>> represented
> > >>>>>>>>> there
> > >>>>>>>>>>>>> now?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <
> > >>>> bamousaa@gmail.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> We are using multiple transactions in a single job in case
> > of
> > >>>> feed
> > >>>>>>>>> and I
> > >>>>>>>>>>>>>> think that this is the correct way.
> > >>>>>>>>>>>>>> Having a single job for a feed that feeds into multiple
> > >> datasets
> > >>>>>>>> is a
> > >>>>>>>>>>>>> good
> > >>>>>>>>>>>>>> thing since job resources/feed resources are consolidated.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Here are some points:
> > >>>>>>>>>>>>>> - We can't use the same transaction id to feed multiple
> > >>>> datasets.
> > >>>>>>>> The
> > >>>>>>>>>>>>> only
> > >>>>>>>>>>>>>> other option is to have multiple jobs each feeding a
> > different
> > >>>>>>>>> dataset.
> > >>>>>>>>>>>>>> - Having multiple jobs (in addition to the extra resources
> > >> used,
> > >>>>>>>>> memory
> > >>>>>>>>>>>>>> and CPU) would then forces us to either read data from
> > >> external
> > >>>>>>>>> sources
> > >>>>>>>>>>>>>> multiple times, parse records multiple times, etc
> > >>>>>>>>>>>>>> or having to have a synchronization between the different
> > jobs
> > >>>> and
> > >>>>>>>>> the
> > >>>>>>>>>>>>>> feed source within asterixdb. IMO, this is far more
> > >> complicated
> > >>>>>>>> than
> > >>>>>>>>>>>>> having
> > >>>>>>>>>>>>>> multiple transactions within a single job and the cost far
> > >>>>>> outweigh
> > >>>>>>>>> the
> > >>>>>>>>>>>>>> benefits.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> P.S,
> > >>>>>>>>>>>>>> We are also using this for bucket connections in Couchbase
> > >>>>>>>> Analytics.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <
> > tillw@apache.org
> > >>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> If there are a number of issue with supporting multiple
> > >>>>>>>> transaction
> > >>>>>>>>> ids
> > >>>>>>>>>>>>>>> and no clear benefits/use-cases, I’d vote for
> > simplification
> > >> :)
> > >>>>>>>>>>>>>>> Also, code that’s not being used has a tendency to "rot"
> > and
> > >>>> so I
> > >>>>>>>>> think
> > >>>>>>>>>>>>>>> that it’s usefulness might be limited by the time we’d
> > find a
> > >>>> use
> > >>>>>>>>> for
> > >>>>>>>>>>>>>>> this functionality.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> My 2c,
> > >>>>>>>>>>>>>>> Till
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I'm separating the connections into different jobs in
> some
> > >> of
> > >>>> my
> > >>>>>>>>>>>>>>>> experiments... but that was intended to be used for the
> > >>>>>>>>> experimental
> > >>>>>>>>>>>>>>>> settings (i.e., not for master now)...
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I think the interesting question here is whether we want
> > to
> > >>>>>> allow
> > >>>>>>>>> one
> > >>>>>>>>>>>>>>>> Hyracks job to carry multiple transactions. I personally
> > >> think
> > >>>>>>>> that
> > >>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>>> be allowed as the transaction and job are two separate
> > >>>> concepts,
> > >>>>>>>>> but I
> > >>>>>>>>>>>>>>>> couldn't find such use cases other than the feeds. Does
> > >> anyone
> > >>>>>>>>> have a
> > >>>>>>>>>>>>>> good
> > >>>>>>>>>>>>>>>> example on this?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Another question is, if we do allow multiple
> transactions
> > >> in a
> > >>>>>>>>> single
> > >>>>>>>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain
> the
> > >>>>>>>> correct
> > >>>>>>>>> TXN
> > >>>>>>>>>>>>>> id
> > >>>>>>>>>>>>>>>> without having that embedded as part of the job
> > >> specification.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>> Xikui
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
> > >>>>>>>>>>>>> bamousaa@gmail.com>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I am curious as to how feed will work without this?
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> ~Abdullah.
> > >>>>>>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <
> > >>>> sjaco002@ucr.edu
> > >>>>>>>
> > >>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Hi all,
> > >>>>>>>>>>>>>>>>>> We currently have MultiTransactionJobletEventLis
> > >>>> tenerFactory,
> > >>>>>>>>> which
> > >>>>>>>>>>>>>>>>> allows
> > >>>>>>>>>>>>>>>>>> for one Hyracks job to run multiple Asterix
> transactions
> > >>>>>>>>> together.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> This class is only used by feeds, and feeds are in
> > process
> > >>>> of
> > >>>>>>>>>>>>>> changing to
> > >>>>>>>>>>>>>>>>>> no longer need this feature. As part of the work in
> > >>>>>>>> pre-deploying
> > >>>>>>>>>>>>> job
> > >>>>>>>>>>>>>>>>>> specifications to be used by multiple hyracks jobs,
> I've
> > >>>> been
> > >>>>>>>>>>>>> working
> > >>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>> removing the transaction id from the job
> specifications,
> > >> as
> > >>>> we
> > >>>>>>>>> use a
> > >>>>>>>>>>>>>> new
> > >>>>>>>>>>>>>>>>>> transaction for each invocation of a deployed job.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> There is currently no clear way to remove the
> > transaction
> > >> id
> > >>>>>>>> from
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>> job
> > >>>>>>>>>>>>>>>>>> spec and keep the option for
> > >> MultiTransactionJobletEventLis
> > >>>>>>>>>>>>>> tenerFactory.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> The question for the group is, do we see a need to
> > >> maintain
> > >>>>>>>> this
> > >>>>>>>>>>>>> class
> > >>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>> will no longer be used by any current code? Or, an
> other
> > >>>>>> words,
> > >>>>>>>>> is
> > >>>>>>>>>>>>>> there
> > >>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>> strong possibility that in the future we will want
> > >> multiple
> > >>>>>>>>>>>>>> transactions
> > >>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth
> > >>>> figuring
> > >>>>>>>> out
> > >>>>>>>>>>>>> how
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>> maintain this class?
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Steven
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Re: MultiTransactionJobletEventListenerFactory

Posted by Steven Jacobs <sj...@ucr.edu>.

If that's true than that solution seems best to me, but we had discussed
this earlier and Xikui mentioned that that might not be true.
@Xikui?
Steven

On Fri, Nov 17, 2017 at 11:55 AM, abdullah alamoudi <ba...@gmail.com>
wrote:

> Right now, they can't, so datasetId can be safely used.
> > On Nov 17, 2017, at 11:51 AM, Steven Jacobs <sj...@ucr.edu> wrote:
> >
> > For option 1, I think the dataset id is not a unique identifier. Couldn't
> > multiple transactions in one job work on the same dataset?
> >
> > Steven
> >
> > On Fri, Nov 17, 2017 at 11:38 AM, abdullah alamoudi <ba...@gmail.com>
> > wrote:
> >
> >> So, there are three options to do this:
> >> 1. Each of these operators work on a a specific dataset. So we can pass
> >> the datasetId to the JobEventListenerFactory when requesting the
> >> transaction id.
> >> 2. We make 1 transaction works for multiple datasets by using a map from
> >> datasetId to primary opTracker and use it when reporting commits by the
> log
> >> flusher thread.
> >> 3. Prevent a job from having multiple transactions. (For the record, I
> >> dislike this option since the price we pay is very high IMO)
> >>
> >> Cheers,
> >> Abdullah.
> >>
> >>> On Nov 17, 2017, at 11:32 AM, Steven Jacobs <sj...@ucr.edu> wrote:
> >>>
> >>> Well, we've solved the problem when there is only one transaction id
> per
> >>> job. The operators can fetch the transaction ids from the
> >>> JobEventListenerFactory (you can find this in master now). The issue
> is,
> >>> when we are trying to combine multiple job specs into one feed job, the
> >>> operators at runtime don't have a memory of which "job spec" they
> >>> originally belonged to which could tell them which one of the
> transaction
> >>> ids that they should use.
> >>>
> >>> Steven
> >>>
> >>> On Fri, Nov 17, 2017 at 11:25 AM, abdullah alamoudi <
> bamousaa@gmail.com>
> >>> wrote:
> >>>
> >>>>
> >>>> I think that this works and seems like the question is how different
> >>>> operators in the job can get their transaction ids.
> >>>>
> >>>> ~Abdullah.
> >>>>
> >>>>> On Nov 17, 2017, at 11:21 AM, Steven Jacobs <sj...@ucr.edu>
> wrote:
> >>>>>
> >>>>> From the conversation, it seems like nobody has the full picture to
> >>>> propose
> >>>>> the design?
> >>>>> For deployed jobs, the idea is to use the same job specification but
> >>>> create
> >>>>> a new Hyracks job and Asterix Transaction for each execution.
> >>>>>
> >>>>> Steven
> >>>>>
> >>>>> On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <
> >> bamousaa@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> I can e-meet anytime (moved to Sunnyvale). We can also look at a
> >>>> proposed
> >>>>>> design and see if it can work
> >>>>>> Back to my question, how were you planning to change the transaction
> >> id
> >>>> if
> >>>>>> we forget about the case with multiple datasets (feed job)?
> >>>>>>
> >>>>>>
> >>>>>>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sj...@ucr.edu>
> >> wrote:
> >>>>>>>
> >>>>>>> Maybe it would be good to have a meeting about this with all
> >> interested
> >>>>>>> parties?
> >>>>>>>
> >>>>>>> I can be on-campus at UCI on Tuesday if that would be a good day to
> >>>> meet.
> >>>>>>>
> >>>>>>> Steven
> >>>>>>>
> >>>>>>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <
> >> bamousaa@gmail.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Also, was wondering how would you do the same for a single dataset
> >>>>>>>> (non-feed). How would you get the transaction id and change it
> when
> >>>> you
> >>>>>>>> re-run?
> >>>>>>>>
> >>>>>>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hu...@gmail.com>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> For atomic transactions, the change was merged yesterday. For
> >> entity
> >>>>>>>> level
> >>>>>>>>> transactions, it should be a very small change.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Murtadha
> >>>>>>>>>
> >>>>>>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <
> >> bamousaa@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> I understand that is not the case right now but what you're
> >> working
> >>>>>> on?
> >>>>>>>>>>
> >>>>>>>>>> Cheers,
> >>>>>>>>>> Abdullah.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <
> >> hubailmor@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> A transaction context can register multiple primary indexes.
> >>>>>>>>>>> Since each entity commit log contains the dataset id, you can
> >>>>>>>> decrement
> >>>>>>>>> the active operations on
> >>>>>>>>>>> the operation tracker associated with that dataset id.
> >>>>>>>>>>>
> >>>>>>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <
> bamousaa@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Can you illustrate how a deadlock can happen? I am anxious to
> >> know.
> >>>>>>>>>>> Moreover, the reason for the multiple transaction ids in feeds
> is
> >>>>>>>> not
> >>>>>>>>> simply because we compile them differently.
> >>>>>>>>>>>
> >>>>>>>>>>> How would a commit operator know which dataset active operation
> >>>>>>>>> counter to decrement if they share the same id for example?
> >>>>>>>>>>>
> >>>>>>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu>
> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes. That deadlock could happen. Currently, we have one-to-one
> >>>>>>>>> mappings for
> >>>>>>>>>>>> the jobs and transactions, except for the feeds.
> >>>>>>>>>>>>
> >>>>>>>>>>>> @Abdullah, after some digging into the code, I think probably
> we
> >>>> can
> >>>>>>>>> use a
> >>>>>>>>>>>> single transaction id for the job which feeds multiple
> datasets?
> >>>> See
> >>>>>>>>> if I
> >>>>>>>>>>>> can convince you. :)
> >>>>>>>>>>>>
> >>>>>>>>>>>> The reason we have multiple transaction ids in feeds is that
> we
> >>>>>>>> compile
> >>>>>>>>>>>> each connection job separately and combine them into a single
> >> feed
> >>>>>>>>> job. A
> >>>>>>>>>>>> new transaction id is created and assigned to each connection
> >> job,
> >>>>>>>>> thus for
> >>>>>>>>>>>> the combined job, we have to handle the different transactions
> >> as
> >>>>>>>> they
> >>>>>>>>>>>> are embedded in the connection job specifications. But, what
> if
> >> we
> >>>>>>>>> create a
> >>>>>>>>>>>> single transaction id for the combined job? That transaction
> id
> >>>> will
> >>>>>>>> be
> >>>>>>>>>>>> embedded into each connection so they can write logs freely,
> but
> >>>> the
> >>>>>>>>>>>> transaction will be started and committed only once as there
> is
> >>>> only
> >>>>>>>>> one
> >>>>>>>>>>>> feed job. In this way, we won't need
> >>>> multiTransactionJobletEventLis
> >>>>>>>>> tener
> >>>>>>>>>>>> and the transaction id can be removed from the job
> specification
> >>>>>>>>> easily as
> >>>>>>>>>>>> well (for Steven's change).
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Xikui
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <
> dtabass@gmail.com
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I worry about deadlocks.  The waits for graph may not
> >> understand
> >>>>>>>> that
> >>>>>>>>>>>>> making t1 wait will also make t2 wait since they may share a
> >>>> thread
> >>>>>>>> -
> >>>>>>>>>>>>> right?  Or do we have jobs and transactions separately
> >>>> represented
> >>>>>>>>> there
> >>>>>>>>>>>>> now?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <
> >>>> bamousaa@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We are using multiple transactions in a single job in case
> of
> >>>> feed
> >>>>>>>>> and I
> >>>>>>>>>>>>>> think that this is the correct way.
> >>>>>>>>>>>>>> Having a single job for a feed that feeds into multiple
> >> datasets
> >>>>>>>> is a
> >>>>>>>>>>>>> good
> >>>>>>>>>>>>>> thing since job resources/feed resources are consolidated.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Here are some points:
> >>>>>>>>>>>>>> - We can't use the same transaction id to feed multiple
> >>>> datasets.
> >>>>>>>> The
> >>>>>>>>>>>>> only
> >>>>>>>>>>>>>> other option is to have multiple jobs each feeding a
> different
> >>>>>>>>> dataset.
> >>>>>>>>>>>>>> - Having multiple jobs (in addition to the extra resources
> >> used,
> >>>>>>>>> memory
> >>>>>>>>>>>>>> and CPU) would then forces us to either read data from
> >> external
> >>>>>>>>> sources
> >>>>>>>>>>>>>> multiple times, parse records multiple times, etc
> >>>>>>>>>>>>>> or having to have a synchronization between the different
> jobs
> >>>> and
> >>>>>>>>> the
> >>>>>>>>>>>>>> feed source within asterixdb. IMO, this is far more
> >> complicated
> >>>>>>>> than
> >>>>>>>>>>>>> having
> >>>>>>>>>>>>>> multiple transactions within a single job and the cost far
> >>>>>> outweigh
> >>>>>>>>> the
> >>>>>>>>>>>>>> benefits.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> P.S,
> >>>>>>>>>>>>>> We are also using this for bucket connections in Couchbase
> >>>>>>>> Analytics.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <
> tillw@apache.org
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If there are a number of issue with supporting multiple
> >>>>>>>> transaction
> >>>>>>>>> ids
> >>>>>>>>>>>>>>> and no clear benefits/use-cases, I’d vote for
> simplification
> >> :)
> >>>>>>>>>>>>>>> Also, code that’s not being used has a tendency to "rot"
> and
> >>>> so I
> >>>>>>>>> think
> >>>>>>>>>>>>>>> that it’s usefulness might be limited by the time we’d
> find a
> >>>> use
> >>>>>>>>> for
> >>>>>>>>>>>>>>> this functionality.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> My 2c,
> >>>>>>>>>>>>>>> Till
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I'm separating the connections into different jobs in some
> >> of
> >>>> my
> >>>>>>>>>>>>>>>> experiments... but that was intended to be used for the
> >>>>>>>>> experimental
> >>>>>>>>>>>>>>>> settings (i.e., not for master now)...
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I think the interesting question here is whether we want
> to
> >>>>>> allow
> >>>>>>>>> one
> >>>>>>>>>>>>>>>> Hyracks job to carry multiple transactions. I personally
> >> think
> >>>>>>>> that
> >>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>> be allowed as the transaction and job are two separate
> >>>> concepts,
> >>>>>>>>> but I
> >>>>>>>>>>>>>>>> couldn't find such use cases other than the feeds. Does
> >> anyone
> >>>>>>>>> have a
> >>>>>>>>>>>>>> good
> >>>>>>>>>>>>>>>> example on this?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Another question is, if we do allow multiple transactions
> >> in a
> >>>>>>>>> single
> >>>>>>>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the
> >>>>>>>> correct
> >>>>>>>>> TXN
> >>>>>>>>>>>>>> id
> >>>>>>>>>>>>>>>> without having that embedded as part of the job
> >> specification.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Xikui
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
> >>>>>>>>>>>>> bamousaa@gmail.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I am curious as to how feed will work without this?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> ~Abdullah.
> >>>>>>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <
> >>>> sjaco002@ucr.edu
> >>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>> We currently have MultiTransactionJobletEventLis
> >>>> tenerFactory,
> >>>>>>>>> which
> >>>>>>>>>>>>>>>>> allows
> >>>>>>>>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions
> >>>>>>>>> together.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> This class is only used by feeds, and feeds are in
> process
> >>>> of
> >>>>>>>>>>>>>> changing to
> >>>>>>>>>>>>>>>>>> no longer need this feature. As part of the work in
> >>>>>>>> pre-deploying
> >>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've
> >>>> been
> >>>>>>>>>>>>> working
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>> removing the transaction id from the job specifications,
> >> as
> >>>> we
> >>>>>>>>> use a
> >>>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>> transaction for each invocation of a deployed job.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> There is currently no clear way to remove the
> transaction
> >> id
> >>>>>>>> from
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>> spec and keep the option for
> >> MultiTransactionJobletEventLis
> >>>>>>>>>>>>>> tenerFactory.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The question for the group is, do we see a need to
> >> maintain
> >>>>>>>> this
> >>>>>>>>>>>>> class
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> will no longer be used by any current code? Or, an other
> >>>>>> words,
> >>>>>>>>> is
> >>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>> strong possibility that in the future we will want
> >> multiple
> >>>>>>>>>>>>>> transactions
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth
> >>>> figuring
> >>>>>>>> out
> >>>>>>>>>>>>> how
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> maintain this class?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Steven
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: MultiTransactionJobletEventListenerFactory

Posted by abdullah alamoudi <ba...@gmail.com>.

Right now, they can't, so datasetId can be safely used.
> On Nov 17, 2017, at 11:51 AM, Steven Jacobs <sj...@ucr.edu> wrote:
> 
> For option 1, I think the dataset id is not a unique identifier. Couldn't
> multiple transactions in one job work on the same dataset?
> 
> Steven
> 
> On Fri, Nov 17, 2017 at 11:38 AM, abdullah alamoudi <ba...@gmail.com>
> wrote:
> 
>> So, there are three options to do this:
>> 1. Each of these operators work on a a specific dataset. So we can pass
>> the datasetId to the JobEventListenerFactory when requesting the
>> transaction id.
>> 2. We make 1 transaction works for multiple datasets by using a map from
>> datasetId to primary opTracker and use it when reporting commits by the log
>> flusher thread.
>> 3. Prevent a job from having multiple transactions. (For the record, I
>> dislike this option since the price we pay is very high IMO)
>> 
>> Cheers,
>> Abdullah.
>> 
>>> On Nov 17, 2017, at 11:32 AM, Steven Jacobs <sj...@ucr.edu> wrote:
>>> 
>>> Well, we've solved the problem when there is only one transaction id per
>>> job. The operators can fetch the transaction ids from the
>>> JobEventListenerFactory (you can find this in master now). The issue is,
>>> when we are trying to combine multiple job specs into one feed job, the
>>> operators at runtime don't have a memory of which "job spec" they
>>> originally belonged to which could tell them which one of the transaction
>>> ids that they should use.
>>> 
>>> Steven
>>> 
>>> On Fri, Nov 17, 2017 at 11:25 AM, abdullah alamoudi <ba...@gmail.com>
>>> wrote:
>>> 
>>>> 
>>>> I think that this works and seems like the question is how different
>>>> operators in the job can get their transaction ids.
>>>> 
>>>> ~Abdullah.
>>>> 
>>>>> On Nov 17, 2017, at 11:21 AM, Steven Jacobs <sj...@ucr.edu> wrote:
>>>>> 
>>>>> From the conversation, it seems like nobody has the full picture to
>>>> propose
>>>>> the design?
>>>>> For deployed jobs, the idea is to use the same job specification but
>>>> create
>>>>> a new Hyracks job and Asterix Transaction for each execution.
>>>>> 
>>>>> Steven
>>>>> 
>>>>> On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <
>> bamousaa@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> I can e-meet anytime (moved to Sunnyvale). We can also look at a
>>>> proposed
>>>>>> design and see if it can work
>>>>>> Back to my question, how were you planning to change the transaction
>> id
>>>> if
>>>>>> we forget about the case with multiple datasets (feed job)?
>>>>>> 
>>>>>> 
>>>>>>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sj...@ucr.edu>
>> wrote:
>>>>>>> 
>>>>>>> Maybe it would be good to have a meeting about this with all
>> interested
>>>>>>> parties?
>>>>>>> 
>>>>>>> I can be on-campus at UCI on Tuesday if that would be a good day to
>>>> meet.
>>>>>>> 
>>>>>>> Steven
>>>>>>> 
>>>>>>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <
>> bamousaa@gmail.com
>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Also, was wondering how would you do the same for a single dataset
>>>>>>>> (non-feed). How would you get the transaction id and change it when
>>>> you
>>>>>>>> re-run?
>>>>>>>> 
>>>>>>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hu...@gmail.com>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> For atomic transactions, the change was merged yesterday. For
>> entity
>>>>>>>> level
>>>>>>>>> transactions, it should be a very small change.
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> Murtadha
>>>>>>>>> 
>>>>>>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <
>> bamousaa@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> I understand that is not the case right now but what you're
>> working
>>>>>> on?
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> Abdullah.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <
>> hubailmor@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> A transaction context can register multiple primary indexes.
>>>>>>>>>>> Since each entity commit log contains the dataset id, you can
>>>>>>>> decrement
>>>>>>>>> the active operations on
>>>>>>>>>>> the operation tracker associated with that dataset id.
>>>>>>>>>>> 
>>>>>>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <ba...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Can you illustrate how a deadlock can happen? I am anxious to
>> know.
>>>>>>>>>>> Moreover, the reason for the multiple transaction ids in feeds is
>>>>>>>> not
>>>>>>>>> simply because we compile them differently.
>>>>>>>>>>> 
>>>>>>>>>>> How would a commit operator know which dataset active operation
>>>>>>>>> counter to decrement if they share the same id for example?
>>>>>>>>>>> 
>>>>>>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Yes. That deadlock could happen. Currently, we have one-to-one
>>>>>>>>> mappings for
>>>>>>>>>>>> the jobs and transactions, except for the feeds.
>>>>>>>>>>>> 
>>>>>>>>>>>> @Abdullah, after some digging into the code, I think probably we
>>>> can
>>>>>>>>> use a
>>>>>>>>>>>> single transaction id for the job which feeds multiple datasets?
>>>> See
>>>>>>>>> if I
>>>>>>>>>>>> can convince you. :)
>>>>>>>>>>>> 
>>>>>>>>>>>> The reason we have multiple transaction ids in feeds is that we
>>>>>>>> compile
>>>>>>>>>>>> each connection job separately and combine them into a single
>> feed
>>>>>>>>> job. A
>>>>>>>>>>>> new transaction id is created and assigned to each connection
>> job,
>>>>>>>>> thus for
>>>>>>>>>>>> the combined job, we have to handle the different transactions
>> as
>>>>>>>> they
>>>>>>>>>>>> are embedded in the connection job specifications. But, what if
>> we
>>>>>>>>> create a
>>>>>>>>>>>> single transaction id for the combined job? That transaction id
>>>> will
>>>>>>>> be
>>>>>>>>>>>> embedded into each connection so they can write logs freely, but
>>>> the
>>>>>>>>>>>> transaction will be started and committed only once as there is
>>>> only
>>>>>>>>> one
>>>>>>>>>>>> feed job. In this way, we won't need
>>>> multiTransactionJobletEventLis
>>>>>>>>> tener
>>>>>>>>>>>> and the transaction id can be removed from the job specification
>>>>>>>>> easily as
>>>>>>>>>>>> well (for Steven's change).
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Xikui
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dtabass@gmail.com
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I worry about deadlocks.  The waits for graph may not
>> understand
>>>>>>>> that
>>>>>>>>>>>>> making t1 wait will also make t2 wait since they may share a
>>>> thread
>>>>>>>> -
>>>>>>>>>>>>> right?  Or do we have jobs and transactions separately
>>>> represented
>>>>>>>>> there
>>>>>>>>>>>>> now?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <
>>>> bamousaa@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We are using multiple transactions in a single job in case of
>>>> feed
>>>>>>>>> and I
>>>>>>>>>>>>>> think that this is the correct way.
>>>>>>>>>>>>>> Having a single job for a feed that feeds into multiple
>> datasets
>>>>>>>> is a
>>>>>>>>>>>>> good
>>>>>>>>>>>>>> thing since job resources/feed resources are consolidated.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Here are some points:
>>>>>>>>>>>>>> - We can't use the same transaction id to feed multiple
>>>> datasets.
>>>>>>>> The
>>>>>>>>>>>>> only
>>>>>>>>>>>>>> other option is to have multiple jobs each feeding a different
>>>>>>>>> dataset.
>>>>>>>>>>>>>> - Having multiple jobs (in addition to the extra resources
>> used,
>>>>>>>>> memory
>>>>>>>>>>>>>> and CPU) would then forces us to either read data from
>> external
>>>>>>>>> sources
>>>>>>>>>>>>>> multiple times, parse records multiple times, etc
>>>>>>>>>>>>>> or having to have a synchronization between the different jobs
>>>> and
>>>>>>>>> the
>>>>>>>>>>>>>> feed source within asterixdb. IMO, this is far more
>> complicated
>>>>>>>> than
>>>>>>>>>>>>> having
>>>>>>>>>>>>>> multiple transactions within a single job and the cost far
>>>>>> outweigh
>>>>>>>>> the
>>>>>>>>>>>>>> benefits.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> P.S,
>>>>>>>>>>>>>> We are also using this for bucket connections in Couchbase
>>>>>>>> Analytics.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <tillw@apache.org
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If there are a number of issue with supporting multiple
>>>>>>>> transaction
>>>>>>>>> ids
>>>>>>>>>>>>>>> and no clear benefits/use-cases, I’d vote for simplification
>> :)
>>>>>>>>>>>>>>> Also, code that’s not being used has a tendency to "rot" and
>>>> so I
>>>>>>>>> think
>>>>>>>>>>>>>>> that it’s usefulness might be limited by the time we’d find a
>>>> use
>>>>>>>>> for
>>>>>>>>>>>>>>> this functionality.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> My 2c,
>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I'm separating the connections into different jobs in some
>> of
>>>> my
>>>>>>>>>>>>>>>> experiments... but that was intended to be used for the
>>>>>>>>> experimental
>>>>>>>>>>>>>>>> settings (i.e., not for master now)...
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think the interesting question here is whether we want to
>>>>>> allow
>>>>>>>>> one
>>>>>>>>>>>>>>>> Hyracks job to carry multiple transactions. I personally
>> think
>>>>>>>> that
>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>> be allowed as the transaction and job are two separate
>>>> concepts,
>>>>>>>>> but I
>>>>>>>>>>>>>>>> couldn't find such use cases other than the feeds. Does
>> anyone
>>>>>>>>> have a
>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>> example on this?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Another question is, if we do allow multiple transactions
>> in a
>>>>>>>>> single
>>>>>>>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the
>>>>>>>> correct
>>>>>>>>> TXN
>>>>>>>>>>>>>> id
>>>>>>>>>>>>>>>> without having that embedded as part of the job
>> specification.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Xikui
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>>>>>>>>>>>>> bamousaa@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I am curious as to how feed will work without this?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> ~Abdullah.
>>>>>>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <
>>>> sjaco002@ucr.edu
>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>> We currently have MultiTransactionJobletEventLis
>>>> tenerFactory,
>>>>>>>>> which
>>>>>>>>>>>>>>>>> allows
>>>>>>>>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions
>>>>>>>>> together.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> This class is only used by feeds, and feeds are in process
>>>> of
>>>>>>>>>>>>>> changing to
>>>>>>>>>>>>>>>>>> no longer need this feature. As part of the work in
>>>>>>>> pre-deploying
>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've
>>>> been
>>>>>>>>>>>>> working
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> removing the transaction id from the job specifications,
>> as
>>>> we
>>>>>>>>> use a
>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>> transaction for each invocation of a deployed job.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> There is currently no clear way to remove the transaction
>> id
>>>>>>>> from
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>> spec and keep the option for
>> MultiTransactionJobletEventLis
>>>>>>>>>>>>>> tenerFactory.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The question for the group is, do we see a need to
>> maintain
>>>>>>>> this
>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> will no longer be used by any current code? Or, an other
>>>>>> words,
>>>>>>>>> is
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> strong possibility that in the future we will want
>> multiple
>>>>>>>>>>>>>> transactions
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth
>>>> figuring
>>>>>>>> out
>>>>>>>>>>>>> how
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> maintain this class?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: MultiTransactionJobletEventListenerFactory

Posted by Steven Jacobs <sj...@ucr.edu>.

For option 1, I think the dataset id is not a unique identifier. Couldn't
multiple transactions in one job work on the same dataset?

Steven

On Fri, Nov 17, 2017 at 11:38 AM, abdullah alamoudi <ba...@gmail.com>
wrote:

> So, there are three options to do this:
> 1. Each of these operators work on a a specific dataset. So we can pass
> the datasetId to the JobEventListenerFactory when requesting the
> transaction id.
> 2. We make 1 transaction works for multiple datasets by using a map from
> datasetId to primary opTracker and use it when reporting commits by the log
> flusher thread.
> 3. Prevent a job from having multiple transactions. (For the record, I
> dislike this option since the price we pay is very high IMO)
>
> Cheers,
> Abdullah.
>
> > On Nov 17, 2017, at 11:32 AM, Steven Jacobs <sj...@ucr.edu> wrote:
> >
> > Well, we've solved the problem when there is only one transaction id per
> > job. The operators can fetch the transaction ids from the
> > JobEventListenerFactory (you can find this in master now). The issue is,
> > when we are trying to combine multiple job specs into one feed job, the
> > operators at runtime don't have a memory of which "job spec" they
> > originally belonged to which could tell them which one of the transaction
> > ids that they should use.
> >
> > Steven
> >
> > On Fri, Nov 17, 2017 at 11:25 AM, abdullah alamoudi <ba...@gmail.com>
> > wrote:
> >
> >>
> >> I think that this works and seems like the question is how different
> >> operators in the job can get their transaction ids.
> >>
> >> ~Abdullah.
> >>
> >>> On Nov 17, 2017, at 11:21 AM, Steven Jacobs <sj...@ucr.edu> wrote:
> >>>
> >>> From the conversation, it seems like nobody has the full picture to
> >> propose
> >>> the design?
> >>> For deployed jobs, the idea is to use the same job specification but
> >> create
> >>> a new Hyracks job and Asterix Transaction for each execution.
> >>>
> >>> Steven
> >>>
> >>> On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <
> bamousaa@gmail.com>
> >>> wrote:
> >>>
> >>>> I can e-meet anytime (moved to Sunnyvale). We can also look at a
> >> proposed
> >>>> design and see if it can work
> >>>> Back to my question, how were you planning to change the transaction
> id
> >> if
> >>>> we forget about the case with multiple datasets (feed job)?
> >>>>
> >>>>
> >>>>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sj...@ucr.edu>
> wrote:
> >>>>>
> >>>>> Maybe it would be good to have a meeting about this with all
> interested
> >>>>> parties?
> >>>>>
> >>>>> I can be on-campus at UCI on Tuesday if that would be a good day to
> >> meet.
> >>>>>
> >>>>> Steven
> >>>>>
> >>>>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <
> bamousaa@gmail.com
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Also, was wondering how would you do the same for a single dataset
> >>>>>> (non-feed). How would you get the transaction id and change it when
> >> you
> >>>>>> re-run?
> >>>>>>
> >>>>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hu...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>>> For atomic transactions, the change was merged yesterday. For
> entity
> >>>>>> level
> >>>>>>> transactions, it should be a very small change.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Murtadha
> >>>>>>>
> >>>>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <
> bamousaa@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> I understand that is not the case right now but what you're
> working
> >>>> on?
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Abdullah.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <
> hubailmor@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> A transaction context can register multiple primary indexes.
> >>>>>>>>> Since each entity commit log contains the dataset id, you can
> >>>>>> decrement
> >>>>>>> the active operations on
> >>>>>>>>> the operation tracker associated with that dataset id.
> >>>>>>>>>
> >>>>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <ba...@gmail.com>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Can you illustrate how a deadlock can happen? I am anxious to
> know.
> >>>>>>>>> Moreover, the reason for the multiple transaction ids in feeds is
> >>>>>> not
> >>>>>>> simply because we compile them differently.
> >>>>>>>>>
> >>>>>>>>> How would a commit operator know which dataset active operation
> >>>>>>> counter to decrement if they share the same id for example?
> >>>>>>>>>
> >>>>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Yes. That deadlock could happen. Currently, we have one-to-one
> >>>>>>> mappings for
> >>>>>>>>>> the jobs and transactions, except for the feeds.
> >>>>>>>>>>
> >>>>>>>>>> @Abdullah, after some digging into the code, I think probably we
> >> can
> >>>>>>> use a
> >>>>>>>>>> single transaction id for the job which feeds multiple datasets?
> >> See
> >>>>>>> if I
> >>>>>>>>>> can convince you. :)
> >>>>>>>>>>
> >>>>>>>>>> The reason we have multiple transaction ids in feeds is that we
> >>>>>> compile
> >>>>>>>>>> each connection job separately and combine them into a single
> feed
> >>>>>>> job. A
> >>>>>>>>>> new transaction id is created and assigned to each connection
> job,
> >>>>>>> thus for
> >>>>>>>>>> the combined job, we have to handle the different transactions
> as
> >>>>>> they
> >>>>>>>>>> are embedded in the connection job specifications. But, what if
> we
> >>>>>>> create a
> >>>>>>>>>> single transaction id for the combined job? That transaction id
> >> will
> >>>>>> be
> >>>>>>>>>> embedded into each connection so they can write logs freely, but
> >> the
> >>>>>>>>>> transaction will be started and committed only once as there is
> >> only
> >>>>>>> one
> >>>>>>>>>> feed job. In this way, we won't need
> >> multiTransactionJobletEventLis
> >>>>>>> tener
> >>>>>>>>>> and the transaction id can be removed from the job specification
> >>>>>>> easily as
> >>>>>>>>>> well (for Steven's change).
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Xikui
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dtabass@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> I worry about deadlocks.  The waits for graph may not
> understand
> >>>>>> that
> >>>>>>>>>>> making t1 wait will also make t2 wait since they may share a
> >> thread
> >>>>>> -
> >>>>>>>>>>> right?  Or do we have jobs and transactions separately
> >> represented
> >>>>>>> there
> >>>>>>>>>>> now?
> >>>>>>>>>>>
> >>>>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <
> >> bamousaa@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> We are using multiple transactions in a single job in case of
> >> feed
> >>>>>>> and I
> >>>>>>>>>>>> think that this is the correct way.
> >>>>>>>>>>>> Having a single job for a feed that feeds into multiple
> datasets
> >>>>>> is a
> >>>>>>>>>>> good
> >>>>>>>>>>>> thing since job resources/feed resources are consolidated.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Here are some points:
> >>>>>>>>>>>> - We can't use the same transaction id to feed multiple
> >> datasets.
> >>>>>> The
> >>>>>>>>>>> only
> >>>>>>>>>>>> other option is to have multiple jobs each feeding a different
> >>>>>>> dataset.
> >>>>>>>>>>>> - Having multiple jobs (in addition to the extra resources
> used,
> >>>>>>> memory
> >>>>>>>>>>>> and CPU) would then forces us to either read data from
> external
> >>>>>>> sources
> >>>>>>>>>>>> multiple times, parse records multiple times, etc
> >>>>>>>>>>>> or having to have a synchronization between the different jobs
> >> and
> >>>>>>> the
> >>>>>>>>>>>> feed source within asterixdb. IMO, this is far more
> complicated
> >>>>>> than
> >>>>>>>>>>> having
> >>>>>>>>>>>> multiple transactions within a single job and the cost far
> >>>> outweigh
> >>>>>>> the
> >>>>>>>>>>>> benefits.
> >>>>>>>>>>>>
> >>>>>>>>>>>> P.S,
> >>>>>>>>>>>> We are also using this for bucket connections in Couchbase
> >>>>>> Analytics.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <tillw@apache.org
> >
> >>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If there are a number of issue with supporting multiple
> >>>>>> transaction
> >>>>>>> ids
> >>>>>>>>>>>>> and no clear benefits/use-cases, I’d vote for simplification
> :)
> >>>>>>>>>>>>> Also, code that’s not being used has a tendency to "rot" and
> >> so I
> >>>>>>> think
> >>>>>>>>>>>>> that it’s usefulness might be limited by the time we’d find a
> >> use
> >>>>>>> for
> >>>>>>>>>>>>> this functionality.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> My 2c,
> >>>>>>>>>>>>> Till
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'm separating the connections into different jobs in some
> of
> >> my
> >>>>>>>>>>>>>> experiments... but that was intended to be used for the
> >>>>>>> experimental
> >>>>>>>>>>>>>> settings (i.e., not for master now)...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think the interesting question here is whether we want to
> >>>> allow
> >>>>>>> one
> >>>>>>>>>>>>>> Hyracks job to carry multiple transactions. I personally
> think
> >>>>>> that
> >>>>>>>>>>>> should
> >>>>>>>>>>>>>> be allowed as the transaction and job are two separate
> >> concepts,
> >>>>>>> but I
> >>>>>>>>>>>>>> couldn't find such use cases other than the feeds. Does
> anyone
> >>>>>>> have a
> >>>>>>>>>>>> good
> >>>>>>>>>>>>>> example on this?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Another question is, if we do allow multiple transactions
> in a
> >>>>>>> single
> >>>>>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the
> >>>>>> correct
> >>>>>>> TXN
> >>>>>>>>>>>> id
> >>>>>>>>>>>>>> without having that embedded as part of the job
> specification.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Xikui
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
> >>>>>>>>>>> bamousaa@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I am curious as to how feed will work without this?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> ~Abdullah.
> >>>>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <
> >> sjaco002@ucr.edu
> >>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>> We currently have MultiTransactionJobletEventLis
> >> tenerFactory,
> >>>>>>> which
> >>>>>>>>>>>>>>> allows
> >>>>>>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions
> >>>>>>> together.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> This class is only used by feeds, and feeds are in process
> >> of
> >>>>>>>>>>>> changing to
> >>>>>>>>>>>>>>>> no longer need this feature. As part of the work in
> >>>>>> pre-deploying
> >>>>>>>>>>> job
> >>>>>>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've
> >> been
> >>>>>>>>>>> working
> >>>>>>>>>>>> on
> >>>>>>>>>>>>>>>> removing the transaction id from the job specifications,
> as
> >> we
> >>>>>>> use a
> >>>>>>>>>>>> new
> >>>>>>>>>>>>>>>> transaction for each invocation of a deployed job.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> There is currently no clear way to remove the transaction
> id
> >>>>>> from
> >>>>>>>>>>> the
> >>>>>>>>>>>> job
> >>>>>>>>>>>>>>>> spec and keep the option for
> MultiTransactionJobletEventLis
> >>>>>>>>>>>> tenerFactory.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The question for the group is, do we see a need to
> maintain
> >>>>>> this
> >>>>>>>>>>> class
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>> will no longer be used by any current code? Or, an other
> >>>> words,
> >>>>>>> is
> >>>>>>>>>>>> there
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>> strong possibility that in the future we will want
> multiple
> >>>>>>>>>>>> transactions
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth
> >> figuring
> >>>>>> out
> >>>>>>>>>>> how
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> maintain this class?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Steven
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: MultiTransactionJobletEventListenerFactory

Posted by abdullah alamoudi <ba...@gmail.com>.

So, there are three options to do this:
1. Each of these operators work on a a specific dataset. So we can pass the datasetId to the JobEventListenerFactory when requesting the transaction id.
2. We make 1 transaction works for multiple datasets by using a map from datasetId to primary opTracker and use it when reporting commits by the log flusher thread.
3. Prevent a job from having multiple transactions. (For the record, I dislike this option since the price we pay is very high IMO)

Cheers,
Abdullah.

> On Nov 17, 2017, at 11:32 AM, Steven Jacobs <sj...@ucr.edu> wrote:
> 
> Well, we've solved the problem when there is only one transaction id per
> job. The operators can fetch the transaction ids from the
> JobEventListenerFactory (you can find this in master now). The issue is,
> when we are trying to combine multiple job specs into one feed job, the
> operators at runtime don't have a memory of which "job spec" they
> originally belonged to which could tell them which one of the transaction
> ids that they should use.
> 
> Steven
> 
> On Fri, Nov 17, 2017 at 11:25 AM, abdullah alamoudi <ba...@gmail.com>
> wrote:
> 
>> 
>> I think that this works and seems like the question is how different
>> operators in the job can get their transaction ids.
>> 
>> ~Abdullah.
>> 
>>> On Nov 17, 2017, at 11:21 AM, Steven Jacobs <sj...@ucr.edu> wrote:
>>> 
>>> From the conversation, it seems like nobody has the full picture to
>> propose
>>> the design?
>>> For deployed jobs, the idea is to use the same job specification but
>> create
>>> a new Hyracks job and Asterix Transaction for each execution.
>>> 
>>> Steven
>>> 
>>> On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <ba...@gmail.com>
>>> wrote:
>>> 
>>>> I can e-meet anytime (moved to Sunnyvale). We can also look at a
>> proposed
>>>> design and see if it can work
>>>> Back to my question, how were you planning to change the transaction id
>> if
>>>> we forget about the case with multiple datasets (feed job)?
>>>> 
>>>> 
>>>>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sj...@ucr.edu> wrote:
>>>>> 
>>>>> Maybe it would be good to have a meeting about this with all interested
>>>>> parties?
>>>>> 
>>>>> I can be on-campus at UCI on Tuesday if that would be a good day to
>> meet.
>>>>> 
>>>>> Steven
>>>>> 
>>>>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <bamousaa@gmail.com
>>> 
>>>>> wrote:
>>>>> 
>>>>>> Also, was wondering how would you do the same for a single dataset
>>>>>> (non-feed). How would you get the transaction id and change it when
>> you
>>>>>> re-run?
>>>>>> 
>>>>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hu...@gmail.com>
>> wrote:
>>>>>> 
>>>>>>> For atomic transactions, the change was merged yesterday. For entity
>>>>>> level
>>>>>>> transactions, it should be a very small change.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Murtadha
>>>>>>> 
>>>>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <ba...@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I understand that is not the case right now but what you're working
>>>> on?
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Abdullah.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <hu...@gmail.com>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> A transaction context can register multiple primary indexes.
>>>>>>>>> Since each entity commit log contains the dataset id, you can
>>>>>> decrement
>>>>>>> the active operations on
>>>>>>>>> the operation tracker associated with that dataset id.
>>>>>>>>> 
>>>>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <ba...@gmail.com>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Can you illustrate how a deadlock can happen? I am anxious to know.
>>>>>>>>> Moreover, the reason for the multiple transaction ids in feeds is
>>>>>> not
>>>>>>> simply because we compile them differently.
>>>>>>>>> 
>>>>>>>>> How would a commit operator know which dataset active operation
>>>>>>> counter to decrement if they share the same id for example?
>>>>>>>>> 
>>>>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu> wrote:
>>>>>>>>>> 
>>>>>>>>>> Yes. That deadlock could happen. Currently, we have one-to-one
>>>>>>> mappings for
>>>>>>>>>> the jobs and transactions, except for the feeds.
>>>>>>>>>> 
>>>>>>>>>> @Abdullah, after some digging into the code, I think probably we
>> can
>>>>>>> use a
>>>>>>>>>> single transaction id for the job which feeds multiple datasets?
>> See
>>>>>>> if I
>>>>>>>>>> can convince you. :)
>>>>>>>>>> 
>>>>>>>>>> The reason we have multiple transaction ids in feeds is that we
>>>>>> compile
>>>>>>>>>> each connection job separately and combine them into a single feed
>>>>>>> job. A
>>>>>>>>>> new transaction id is created and assigned to each connection job,
>>>>>>> thus for
>>>>>>>>>> the combined job, we have to handle the different transactions as
>>>>>> they
>>>>>>>>>> are embedded in the connection job specifications. But, what if we
>>>>>>> create a
>>>>>>>>>> single transaction id for the combined job? That transaction id
>> will
>>>>>> be
>>>>>>>>>> embedded into each connection so they can write logs freely, but
>> the
>>>>>>>>>> transaction will be started and committed only once as there is
>> only
>>>>>>> one
>>>>>>>>>> feed job. In this way, we won't need
>> multiTransactionJobletEventLis
>>>>>>> tener
>>>>>>>>>> and the transaction id can be removed from the job specification
>>>>>>> easily as
>>>>>>>>>> well (for Steven's change).
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Xikui
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dt...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I worry about deadlocks.  The waits for graph may not understand
>>>>>> that
>>>>>>>>>>> making t1 wait will also make t2 wait since they may share a
>> thread
>>>>>> -
>>>>>>>>>>> right?  Or do we have jobs and transactions separately
>> represented
>>>>>>> there
>>>>>>>>>>> now?
>>>>>>>>>>> 
>>>>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <
>> bamousaa@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> We are using multiple transactions in a single job in case of
>> feed
>>>>>>> and I
>>>>>>>>>>>> think that this is the correct way.
>>>>>>>>>>>> Having a single job for a feed that feeds into multiple datasets
>>>>>> is a
>>>>>>>>>>> good
>>>>>>>>>>>> thing since job resources/feed resources are consolidated.
>>>>>>>>>>>> 
>>>>>>>>>>>> Here are some points:
>>>>>>>>>>>> - We can't use the same transaction id to feed multiple
>> datasets.
>>>>>> The
>>>>>>>>>>> only
>>>>>>>>>>>> other option is to have multiple jobs each feeding a different
>>>>>>> dataset.
>>>>>>>>>>>> - Having multiple jobs (in addition to the extra resources used,
>>>>>>> memory
>>>>>>>>>>>> and CPU) would then forces us to either read data from external
>>>>>>> sources
>>>>>>>>>>>> multiple times, parse records multiple times, etc
>>>>>>>>>>>> or having to have a synchronization between the different jobs
>> and
>>>>>>> the
>>>>>>>>>>>> feed source within asterixdb. IMO, this is far more complicated
>>>>>> than
>>>>>>>>>>> having
>>>>>>>>>>>> multiple transactions within a single job and the cost far
>>>> outweigh
>>>>>>> the
>>>>>>>>>>>> benefits.
>>>>>>>>>>>> 
>>>>>>>>>>>> P.S,
>>>>>>>>>>>> We are also using this for bucket connections in Couchbase
>>>>>> Analytics.
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org>
>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If there are a number of issue with supporting multiple
>>>>>> transaction
>>>>>>> ids
>>>>>>>>>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
>>>>>>>>>>>>> Also, code that’s not being used has a tendency to "rot" and
>> so I
>>>>>>> think
>>>>>>>>>>>>> that it’s usefulness might be limited by the time we’d find a
>> use
>>>>>>> for
>>>>>>>>>>>>> this functionality.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> My 2c,
>>>>>>>>>>>>> Till
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'm separating the connections into different jobs in some of
>> my
>>>>>>>>>>>>>> experiments... but that was intended to be used for the
>>>>>>> experimental
>>>>>>>>>>>>>> settings (i.e., not for master now)...
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think the interesting question here is whether we want to
>>>> allow
>>>>>>> one
>>>>>>>>>>>>>> Hyracks job to carry multiple transactions. I personally think
>>>>>> that
>>>>>>>>>>>> should
>>>>>>>>>>>>>> be allowed as the transaction and job are two separate
>> concepts,
>>>>>>> but I
>>>>>>>>>>>>>> couldn't find such use cases other than the feeds. Does anyone
>>>>>>> have a
>>>>>>>>>>>> good
>>>>>>>>>>>>>> example on this?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Another question is, if we do allow multiple transactions in a
>>>>>>> single
>>>>>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the
>>>>>> correct
>>>>>>> TXN
>>>>>>>>>>>> id
>>>>>>>>>>>>>> without having that embedded as part of the job specification.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Xikui
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>>>>>>>>>>> bamousaa@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I am curious as to how feed will work without this?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ~Abdullah.
>>>>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <
>> sjaco002@ucr.edu
>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>> We currently have MultiTransactionJobletEventLis
>> tenerFactory,
>>>>>>> which
>>>>>>>>>>>>>>> allows
>>>>>>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions
>>>>>>> together.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This class is only used by feeds, and feeds are in process
>> of
>>>>>>>>>>>> changing to
>>>>>>>>>>>>>>>> no longer need this feature. As part of the work in
>>>>>> pre-deploying
>>>>>>>>>>> job
>>>>>>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've
>> been
>>>>>>>>>>> working
>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> removing the transaction id from the job specifications, as
>> we
>>>>>>> use a
>>>>>>>>>>>> new
>>>>>>>>>>>>>>>> transaction for each invocation of a deployed job.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> There is currently no clear way to remove the transaction id
>>>>>> from
>>>>>>>>>>> the
>>>>>>>>>>>> job
>>>>>>>>>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
>>>>>>>>>>>> tenerFactory.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The question for the group is, do we see a need to maintain
>>>>>> this
>>>>>>>>>>> class
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> will no longer be used by any current code? Or, an other
>>>> words,
>>>>>>> is
>>>>>>>>>>>> there
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> strong possibility that in the future we will want multiple
>>>>>>>>>>>> transactions
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth
>> figuring
>>>>>> out
>>>>>>>>>>> how
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> maintain this class?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: MultiTransactionJobletEventListenerFactory

Posted by Steven Jacobs <sj...@ucr.edu>.

Well, we've solved the problem when there is only one transaction id per
job. The operators can fetch the transaction ids from the
JobEventListenerFactory (you can find this in master now). The issue is,
when we are trying to combine multiple job specs into one feed job, the
operators at runtime don't have a memory of which "job spec" they
originally belonged to which could tell them which one of the transaction
ids that they should use.

Steven

On Fri, Nov 17, 2017 at 11:25 AM, abdullah alamoudi <ba...@gmail.com>
wrote:

>
> I think that this works and seems like the question is how different
> operators in the job can get their transaction ids.
>
> ~Abdullah.
>
> > On Nov 17, 2017, at 11:21 AM, Steven Jacobs <sj...@ucr.edu> wrote:
> >
> > From the conversation, it seems like nobody has the full picture to
> propose
> > the design?
> > For deployed jobs, the idea is to use the same job specification but
> create
> > a new Hyracks job and Asterix Transaction for each execution.
> >
> > Steven
> >
> > On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <ba...@gmail.com>
> > wrote:
> >
> >> I can e-meet anytime (moved to Sunnyvale). We can also look at a
> proposed
> >> design and see if it can work
> >> Back to my question, how were you planning to change the transaction id
> if
> >> we forget about the case with multiple datasets (feed job)?
> >>
> >>
> >>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sj...@ucr.edu> wrote:
> >>>
> >>> Maybe it would be good to have a meeting about this with all interested
> >>> parties?
> >>>
> >>> I can be on-campus at UCI on Tuesday if that would be a good day to
> meet.
> >>>
> >>> Steven
> >>>
> >>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <bamousaa@gmail.com
> >
> >>> wrote:
> >>>
> >>>> Also, was wondering how would you do the same for a single dataset
> >>>> (non-feed). How would you get the transaction id and change it when
> you
> >>>> re-run?
> >>>>
> >>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hu...@gmail.com>
> wrote:
> >>>>
> >>>>> For atomic transactions, the change was merged yesterday. For entity
> >>>> level
> >>>>> transactions, it should be a very small change.
> >>>>>
> >>>>> Cheers,
> >>>>> Murtadha
> >>>>>
> >>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <ba...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> I understand that is not the case right now but what you're working
> >> on?
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Abdullah.
> >>>>>>
> >>>>>>
> >>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <hu...@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>> A transaction context can register multiple primary indexes.
> >>>>>>> Since each entity commit log contains the dataset id, you can
> >>>> decrement
> >>>>> the active operations on
> >>>>>>> the operation tracker associated with that dataset id.
> >>>>>>>
> >>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <ba...@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>> Can you illustrate how a deadlock can happen? I am anxious to know.
> >>>>>>> Moreover, the reason for the multiple transaction ids in feeds is
> >>>> not
> >>>>> simply because we compile them differently.
> >>>>>>>
> >>>>>>> How would a commit operator know which dataset active operation
> >>>>> counter to decrement if they share the same id for example?
> >>>>>>>
> >>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu> wrote:
> >>>>>>>>
> >>>>>>>> Yes. That deadlock could happen. Currently, we have one-to-one
> >>>>> mappings for
> >>>>>>>> the jobs and transactions, except for the feeds.
> >>>>>>>>
> >>>>>>>> @Abdullah, after some digging into the code, I think probably we
> can
> >>>>> use a
> >>>>>>>> single transaction id for the job which feeds multiple datasets?
> See
> >>>>> if I
> >>>>>>>> can convince you. :)
> >>>>>>>>
> >>>>>>>> The reason we have multiple transaction ids in feeds is that we
> >>>> compile
> >>>>>>>> each connection job separately and combine them into a single feed
> >>>>> job. A
> >>>>>>>> new transaction id is created and assigned to each connection job,
> >>>>> thus for
> >>>>>>>> the combined job, we have to handle the different transactions as
> >>>> they
> >>>>>>>> are embedded in the connection job specifications. But, what if we
> >>>>> create a
> >>>>>>>> single transaction id for the combined job? That transaction id
> will
> >>>> be
> >>>>>>>> embedded into each connection so they can write logs freely, but
> the
> >>>>>>>> transaction will be started and committed only once as there is
> only
> >>>>> one
> >>>>>>>> feed job. In this way, we won't need
> multiTransactionJobletEventLis
> >>>>> tener
> >>>>>>>> and the transaction id can be removed from the job specification
> >>>>> easily as
> >>>>>>>> well (for Steven's change).
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Xikui
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dt...@gmail.com>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> I worry about deadlocks.  The waits for graph may not understand
> >>>> that
> >>>>>>>>> making t1 wait will also make t2 wait since they may share a
> thread
> >>>> -
> >>>>>>>>> right?  Or do we have jobs and transactions separately
> represented
> >>>>> there
> >>>>>>>>> now?
> >>>>>>>>>
> >>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <
> bamousaa@gmail.com>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> We are using multiple transactions in a single job in case of
> feed
> >>>>> and I
> >>>>>>>>>> think that this is the correct way.
> >>>>>>>>>> Having a single job for a feed that feeds into multiple datasets
> >>>> is a
> >>>>>>>>> good
> >>>>>>>>>> thing since job resources/feed resources are consolidated.
> >>>>>>>>>>
> >>>>>>>>>> Here are some points:
> >>>>>>>>>> - We can't use the same transaction id to feed multiple
> datasets.
> >>>> The
> >>>>>>>>> only
> >>>>>>>>>> other option is to have multiple jobs each feeding a different
> >>>>> dataset.
> >>>>>>>>>> - Having multiple jobs (in addition to the extra resources used,
> >>>>> memory
> >>>>>>>>>> and CPU) would then forces us to either read data from external
> >>>>> sources
> >>>>>>>>>> multiple times, parse records multiple times, etc
> >>>>>>>>>> or having to have a synchronization between the different jobs
> and
> >>>>> the
> >>>>>>>>>> feed source within asterixdb. IMO, this is far more complicated
> >>>> than
> >>>>>>>>> having
> >>>>>>>>>> multiple transactions within a single job and the cost far
> >> outweigh
> >>>>> the
> >>>>>>>>>> benefits.
> >>>>>>>>>>
> >>>>>>>>>> P.S,
> >>>>>>>>>> We are also using this for bucket connections in Couchbase
> >>>> Analytics.
> >>>>>>>>>>
> >>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org>
> >>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> If there are a number of issue with supporting multiple
> >>>> transaction
> >>>>> ids
> >>>>>>>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
> >>>>>>>>>>> Also, code that’s not being used has a tendency to "rot" and
> so I
> >>>>> think
> >>>>>>>>>>> that it’s usefulness might be limited by the time we’d find a
> use
> >>>>> for
> >>>>>>>>>>> this functionality.
> >>>>>>>>>>>
> >>>>>>>>>>> My 2c,
> >>>>>>>>>>> Till
> >>>>>>>>>>>
> >>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm separating the connections into different jobs in some of
> my
> >>>>>>>>>>>> experiments... but that was intended to be used for the
> >>>>> experimental
> >>>>>>>>>>>> settings (i.e., not for master now)...
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think the interesting question here is whether we want to
> >> allow
> >>>>> one
> >>>>>>>>>>>> Hyracks job to carry multiple transactions. I personally think
> >>>> that
> >>>>>>>>>> should
> >>>>>>>>>>>> be allowed as the transaction and job are two separate
> concepts,
> >>>>> but I
> >>>>>>>>>>>> couldn't find such use cases other than the feeds. Does anyone
> >>>>> have a
> >>>>>>>>>> good
> >>>>>>>>>>>> example on this?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Another question is, if we do allow multiple transactions in a
> >>>>> single
> >>>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the
> >>>> correct
> >>>>> TXN
> >>>>>>>>>> id
> >>>>>>>>>>>> without having that embedded as part of the job specification.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Xikui
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
> >>>>>>>>> bamousaa@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I am curious as to how feed will work without this?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ~Abdullah.
> >>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <
> sjaco002@ucr.edu
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>> We currently have MultiTransactionJobletEventLis
> tenerFactory,
> >>>>> which
> >>>>>>>>>>>>> allows
> >>>>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions
> >>>>> together.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This class is only used by feeds, and feeds are in process
> of
> >>>>>>>>>> changing to
> >>>>>>>>>>>>>> no longer need this feature. As part of the work in
> >>>> pre-deploying
> >>>>>>>>> job
> >>>>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've
> been
> >>>>>>>>> working
> >>>>>>>>>> on
> >>>>>>>>>>>>>> removing the transaction id from the job specifications, as
> we
> >>>>> use a
> >>>>>>>>>> new
> >>>>>>>>>>>>>> transaction for each invocation of a deployed job.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> There is currently no clear way to remove the transaction id
> >>>> from
> >>>>>>>>> the
> >>>>>>>>>> job
> >>>>>>>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
> >>>>>>>>>> tenerFactory.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The question for the group is, do we see a need to maintain
> >>>> this
> >>>>>>>>> class
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>> will no longer be used by any current code? Or, an other
> >> words,
> >>>>> is
> >>>>>>>>>> there
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>>> strong possibility that in the future we will want multiple
> >>>>>>>>>> transactions
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth
> figuring
> >>>> out
> >>>>>>>>> how
> >>>>>>>>>> to
> >>>>>>>>>>>>>> maintain this class?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Steven
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>

Re: MultiTransactionJobletEventListenerFactory

Posted by abdullah alamoudi <ba...@gmail.com>.

I think that this works and seems like the question is how different operators in the job can get their transaction ids.

~Abdullah.

> On Nov 17, 2017, at 11:21 AM, Steven Jacobs <sj...@ucr.edu> wrote:
> 
> From the conversation, it seems like nobody has the full picture to propose
> the design?
> For deployed jobs, the idea is to use the same job specification but create
> a new Hyracks job and Asterix Transaction for each execution.
> 
> Steven
> 
> On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <ba...@gmail.com>
> wrote:
> 
>> I can e-meet anytime (moved to Sunnyvale). We can also look at a proposed
>> design and see if it can work
>> Back to my question, how were you planning to change the transaction id if
>> we forget about the case with multiple datasets (feed job)?
>> 
>> 
>>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sj...@ucr.edu> wrote:
>>> 
>>> Maybe it would be good to have a meeting about this with all interested
>>> parties?
>>> 
>>> I can be on-campus at UCI on Tuesday if that would be a good day to meet.
>>> 
>>> Steven
>>> 
>>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <ba...@gmail.com>
>>> wrote:
>>> 
>>>> Also, was wondering how would you do the same for a single dataset
>>>> (non-feed). How would you get the transaction id and change it when you
>>>> re-run?
>>>> 
>>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hu...@gmail.com> wrote:
>>>> 
>>>>> For atomic transactions, the change was merged yesterday. For entity
>>>> level
>>>>> transactions, it should be a very small change.
>>>>> 
>>>>> Cheers,
>>>>> Murtadha
>>>>> 
>>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <ba...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> I understand that is not the case right now but what you're working
>> on?
>>>>>> 
>>>>>> Cheers,
>>>>>> Abdullah.
>>>>>> 
>>>>>> 
>>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <hu...@gmail.com>
>>>>> wrote:
>>>>>>> 
>>>>>>> A transaction context can register multiple primary indexes.
>>>>>>> Since each entity commit log contains the dataset id, you can
>>>> decrement
>>>>> the active operations on
>>>>>>> the operation tracker associated with that dataset id.
>>>>>>> 
>>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <ba...@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>> Can you illustrate how a deadlock can happen? I am anxious to know.
>>>>>>> Moreover, the reason for the multiple transaction ids in feeds is
>>>> not
>>>>> simply because we compile them differently.
>>>>>>> 
>>>>>>> How would a commit operator know which dataset active operation
>>>>> counter to decrement if they share the same id for example?
>>>>>>> 
>>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu> wrote:
>>>>>>>> 
>>>>>>>> Yes. That deadlock could happen. Currently, we have one-to-one
>>>>> mappings for
>>>>>>>> the jobs and transactions, except for the feeds.
>>>>>>>> 
>>>>>>>> @Abdullah, after some digging into the code, I think probably we can
>>>>> use a
>>>>>>>> single transaction id for the job which feeds multiple datasets? See
>>>>> if I
>>>>>>>> can convince you. :)
>>>>>>>> 
>>>>>>>> The reason we have multiple transaction ids in feeds is that we
>>>> compile
>>>>>>>> each connection job separately and combine them into a single feed
>>>>> job. A
>>>>>>>> new transaction id is created and assigned to each connection job,
>>>>> thus for
>>>>>>>> the combined job, we have to handle the different transactions as
>>>> they
>>>>>>>> are embedded in the connection job specifications. But, what if we
>>>>> create a
>>>>>>>> single transaction id for the combined job? That transaction id will
>>>> be
>>>>>>>> embedded into each connection so they can write logs freely, but the
>>>>>>>> transaction will be started and committed only once as there is only
>>>>> one
>>>>>>>> feed job. In this way, we won't need multiTransactionJobletEventLis
>>>>> tener
>>>>>>>> and the transaction id can be removed from the job specification
>>>>> easily as
>>>>>>>> well (for Steven's change).
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Xikui
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dt...@gmail.com>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> I worry about deadlocks.  The waits for graph may not understand
>>>> that
>>>>>>>>> making t1 wait will also make t2 wait since they may share a thread
>>>> -
>>>>>>>>> right?  Or do we have jobs and transactions separately represented
>>>>> there
>>>>>>>>> now?
>>>>>>>>> 
>>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <ba...@gmail.com>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> We are using multiple transactions in a single job in case of feed
>>>>> and I
>>>>>>>>>> think that this is the correct way.
>>>>>>>>>> Having a single job for a feed that feeds into multiple datasets
>>>> is a
>>>>>>>>> good
>>>>>>>>>> thing since job resources/feed resources are consolidated.
>>>>>>>>>> 
>>>>>>>>>> Here are some points:
>>>>>>>>>> - We can't use the same transaction id to feed multiple datasets.
>>>> The
>>>>>>>>> only
>>>>>>>>>> other option is to have multiple jobs each feeding a different
>>>>> dataset.
>>>>>>>>>> - Having multiple jobs (in addition to the extra resources used,
>>>>> memory
>>>>>>>>>> and CPU) would then forces us to either read data from external
>>>>> sources
>>>>>>>>>> multiple times, parse records multiple times, etc
>>>>>>>>>> or having to have a synchronization between the different jobs and
>>>>> the
>>>>>>>>>> feed source within asterixdb. IMO, this is far more complicated
>>>> than
>>>>>>>>> having
>>>>>>>>>> multiple transactions within a single job and the cost far
>> outweigh
>>>>> the
>>>>>>>>>> benefits.
>>>>>>>>>> 
>>>>>>>>>> P.S,
>>>>>>>>>> We are also using this for bucket connections in Couchbase
>>>> Analytics.
>>>>>>>>>> 
>>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org>
>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> If there are a number of issue with supporting multiple
>>>> transaction
>>>>> ids
>>>>>>>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
>>>>>>>>>>> Also, code that’s not being used has a tendency to "rot" and so I
>>>>> think
>>>>>>>>>>> that it’s usefulness might be limited by the time we’d find a use
>>>>> for
>>>>>>>>>>> this functionality.
>>>>>>>>>>> 
>>>>>>>>>>> My 2c,
>>>>>>>>>>> Till
>>>>>>>>>>> 
>>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm separating the connections into different jobs in some of my
>>>>>>>>>>>> experiments... but that was intended to be used for the
>>>>> experimental
>>>>>>>>>>>> settings (i.e., not for master now)...
>>>>>>>>>>>> 
>>>>>>>>>>>> I think the interesting question here is whether we want to
>> allow
>>>>> one
>>>>>>>>>>>> Hyracks job to carry multiple transactions. I personally think
>>>> that
>>>>>>>>>> should
>>>>>>>>>>>> be allowed as the transaction and job are two separate concepts,
>>>>> but I
>>>>>>>>>>>> couldn't find such use cases other than the feeds. Does anyone
>>>>> have a
>>>>>>>>>> good
>>>>>>>>>>>> example on this?
>>>>>>>>>>>> 
>>>>>>>>>>>> Another question is, if we do allow multiple transactions in a
>>>>> single
>>>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the
>>>> correct
>>>>> TXN
>>>>>>>>>> id
>>>>>>>>>>>> without having that embedded as part of the job specification.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Xikui
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>>>>>>>>> bamousaa@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I am curious as to how feed will work without this?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ~Abdullah.
>>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sjaco002@ucr.edu
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory,
>>>>> which
>>>>>>>>>>>>> allows
>>>>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions
>>>>> together.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This class is only used by feeds, and feeds are in process of
>>>>>>>>>> changing to
>>>>>>>>>>>>>> no longer need this feature. As part of the work in
>>>> pre-deploying
>>>>>>>>> job
>>>>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've been
>>>>>>>>> working
>>>>>>>>>> on
>>>>>>>>>>>>>> removing the transaction id from the job specifications, as we
>>>>> use a
>>>>>>>>>> new
>>>>>>>>>>>>>> transaction for each invocation of a deployed job.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> There is currently no clear way to remove the transaction id
>>>> from
>>>>>>>>> the
>>>>>>>>>> job
>>>>>>>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
>>>>>>>>>> tenerFactory.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The question for the group is, do we see a need to maintain
>>>> this
>>>>>>>>> class
>>>>>>>>>>>>> that
>>>>>>>>>>>>>> will no longer be used by any current code? Or, an other
>> words,
>>>>> is
>>>>>>>>>> there
>>>>>>>>>>>>> a
>>>>>>>>>>>>>> strong possibility that in the future we will want multiple
>>>>>>>>>> transactions
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth figuring
>>>> out
>>>>>>>>> how
>>>>>>>>>> to
>>>>>>>>>>>>>> maintain this class?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: MultiTransactionJobletEventListenerFactory

Posted by Steven Jacobs <sj...@ucr.edu>.

From the conversation, it seems like nobody has the full picture to propose
the design?
For deployed jobs, the idea is to use the same job specification but create
a new Hyracks job and Asterix Transaction for each execution.

Steven

On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <ba...@gmail.com>
wrote:

> I can e-meet anytime (moved to Sunnyvale). We can also look at a proposed
> design and see if it can work
> Back to my question, how were you planning to change the transaction id if
> we forget about the case with multiple datasets (feed job)?
>
>
> > On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sj...@ucr.edu> wrote:
> >
> > Maybe it would be good to have a meeting about this with all interested
> > parties?
> >
> > I can be on-campus at UCI on Tuesday if that would be a good day to meet.
> >
> > Steven
> >
> > On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <ba...@gmail.com>
> > wrote:
> >
> >> Also, was wondering how would you do the same for a single dataset
> >> (non-feed). How would you get the transaction id and change it when you
> >> re-run?
> >>
> >> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hu...@gmail.com> wrote:
> >>
> >>> For atomic transactions, the change was merged yesterday. For entity
> >> level
> >>> transactions, it should be a very small change.
> >>>
> >>> Cheers,
> >>> Murtadha
> >>>
> >>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <ba...@gmail.com>
> >>> wrote:
> >>>>
> >>>> I understand that is not the case right now but what you're working
> on?
> >>>>
> >>>> Cheers,
> >>>> Abdullah.
> >>>>
> >>>>
> >>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <hu...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>> A transaction context can register multiple primary indexes.
> >>>>> Since each entity commit log contains the dataset id, you can
> >> decrement
> >>> the active operations on
> >>>>> the operation tracker associated with that dataset id.
> >>>>>
> >>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <ba...@gmail.com>
> >> wrote:
> >>>>>
> >>>>>  Can you illustrate how a deadlock can happen? I am anxious to know.
> >>>>>  Moreover, the reason for the multiple transaction ids in feeds is
> >> not
> >>> simply because we compile them differently.
> >>>>>
> >>>>>  How would a commit operator know which dataset active operation
> >>> counter to decrement if they share the same id for example?
> >>>>>
> >>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu> wrote:
> >>>>>>
> >>>>>> Yes. That deadlock could happen. Currently, we have one-to-one
> >>> mappings for
> >>>>>> the jobs and transactions, except for the feeds.
> >>>>>>
> >>>>>> @Abdullah, after some digging into the code, I think probably we can
> >>> use a
> >>>>>> single transaction id for the job which feeds multiple datasets? See
> >>> if I
> >>>>>> can convince you. :)
> >>>>>>
> >>>>>> The reason we have multiple transaction ids in feeds is that we
> >> compile
> >>>>>> each connection job separately and combine them into a single feed
> >>> job. A
> >>>>>> new transaction id is created and assigned to each connection job,
> >>> thus for
> >>>>>> the combined job, we have to handle the different transactions as
> >> they
> >>>>>> are embedded in the connection job specifications. But, what if we
> >>> create a
> >>>>>> single transaction id for the combined job? That transaction id will
> >> be
> >>>>>> embedded into each connection so they can write logs freely, but the
> >>>>>> transaction will be started and committed only once as there is only
> >>> one
> >>>>>> feed job. In this way, we won't need multiTransactionJobletEventLis
> >>> tener
> >>>>>> and the transaction id can be removed from the job specification
> >>> easily as
> >>>>>> well (for Steven's change).
> >>>>>>
> >>>>>> Best,
> >>>>>> Xikui
> >>>>>>
> >>>>>>
> >>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dt...@gmail.com>
> >>> wrote:
> >>>>>>>
> >>>>>>> I worry about deadlocks.  The waits for graph may not understand
> >> that
> >>>>>>> making t1 wait will also make t2 wait since they may share a thread
> >> -
> >>>>>>> right?  Or do we have jobs and transactions separately represented
> >>> there
> >>>>>>> now?
> >>>>>>>
> >>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <ba...@gmail.com>
> >>> wrote:
> >>>>>>>>
> >>>>>>>> We are using multiple transactions in a single job in case of feed
> >>> and I
> >>>>>>>> think that this is the correct way.
> >>>>>>>> Having a single job for a feed that feeds into multiple datasets
> >> is a
> >>>>>>> good
> >>>>>>>> thing since job resources/feed resources are consolidated.
> >>>>>>>>
> >>>>>>>> Here are some points:
> >>>>>>>> - We can't use the same transaction id to feed multiple datasets.
> >> The
> >>>>>>> only
> >>>>>>>> other option is to have multiple jobs each feeding a different
> >>> dataset.
> >>>>>>>> - Having multiple jobs (in addition to the extra resources used,
> >>> memory
> >>>>>>>> and CPU) would then forces us to either read data from external
> >>> sources
> >>>>>>>> multiple times, parse records multiple times, etc
> >>>>>>>> or having to have a synchronization between the different jobs and
> >>> the
> >>>>>>>> feed source within asterixdb. IMO, this is far more complicated
> >> than
> >>>>>>> having
> >>>>>>>> multiple transactions within a single job and the cost far
> outweigh
> >>> the
> >>>>>>>> benefits.
> >>>>>>>>
> >>>>>>>> P.S,
> >>>>>>>> We are also using this for bucket connections in Couchbase
> >> Analytics.
> >>>>>>>>
> >>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org>
> >>> wrote:
> >>>>>>>>>
> >>>>>>>>> If there are a number of issue with supporting multiple
> >> transaction
> >>> ids
> >>>>>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
> >>>>>>>>> Also, code that’s not being used has a tendency to "rot" and so I
> >>> think
> >>>>>>>>> that it’s usefulness might be limited by the time we’d find a use
> >>> for
> >>>>>>>>> this functionality.
> >>>>>>>>>
> >>>>>>>>> My 2c,
> >>>>>>>>> Till
> >>>>>>>>>
> >>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> >>>>>>>>>>
> >>>>>>>>>> I'm separating the connections into different jobs in some of my
> >>>>>>>>>> experiments... but that was intended to be used for the
> >>> experimental
> >>>>>>>>>> settings (i.e., not for master now)...
> >>>>>>>>>>
> >>>>>>>>>> I think the interesting question here is whether we want to
> allow
> >>> one
> >>>>>>>>>> Hyracks job to carry multiple transactions. I personally think
> >> that
> >>>>>>>> should
> >>>>>>>>>> be allowed as the transaction and job are two separate concepts,
> >>> but I
> >>>>>>>>>> couldn't find such use cases other than the feeds. Does anyone
> >>> have a
> >>>>>>>> good
> >>>>>>>>>> example on this?
> >>>>>>>>>>
> >>>>>>>>>> Another question is, if we do allow multiple transactions in a
> >>> single
> >>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the
> >> correct
> >>> TXN
> >>>>>>>> id
> >>>>>>>>>> without having that embedded as part of the job specification.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Xikui
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
> >>>>>>> bamousaa@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I am curious as to how feed will work without this?
> >>>>>>>>>>>
> >>>>>>>>>>> ~Abdullah.
> >>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sjaco002@ucr.edu
> >
> >>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi all,
> >>>>>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory,
> >>> which
> >>>>>>>>>>> allows
> >>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions
> >>> together.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This class is only used by feeds, and feeds are in process of
> >>>>>>>> changing to
> >>>>>>>>>>>> no longer need this feature. As part of the work in
> >> pre-deploying
> >>>>>>> job
> >>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've been
> >>>>>>> working
> >>>>>>>> on
> >>>>>>>>>>>> removing the transaction id from the job specifications, as we
> >>> use a
> >>>>>>>> new
> >>>>>>>>>>>> transaction for each invocation of a deployed job.
> >>>>>>>>>>>>
> >>>>>>>>>>>> There is currently no clear way to remove the transaction id
> >> from
> >>>>>>> the
> >>>>>>>> job
> >>>>>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
> >>>>>>>> tenerFactory.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The question for the group is, do we see a need to maintain
> >> this
> >>>>>>> class
> >>>>>>>>>>> that
> >>>>>>>>>>>> will no longer be used by any current code? Or, an other
> words,
> >>> is
> >>>>>>>> there
> >>>>>>>>>>> a
> >>>>>>>>>>>> strong possibility that in the future we will want multiple
> >>>>>>>> transactions
> >>>>>>>>>>> to
> >>>>>>>>>>>> share a single Hyracks job, meaning that it is worth figuring
> >> out
> >>>>>>> how
> >>>>>>>> to
> >>>>>>>>>>>> maintain this class?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Steven
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: MultiTransactionJobletEventListenerFactory

Posted by abdullah alamoudi <ba...@gmail.com>.

I can e-meet anytime (moved to Sunnyvale). We can also look at a proposed design and see if it can work
Back to my question, how were you planning to change the transaction id if we forget about the case with multiple datasets (feed job)?


> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <sj...@ucr.edu> wrote:
> 
> Maybe it would be good to have a meeting about this with all interested
> parties?
> 
> I can be on-campus at UCI on Tuesday if that would be a good day to meet.
> 
> Steven
> 
> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <ba...@gmail.com>
> wrote:
> 
>> Also, was wondering how would you do the same for a single dataset
>> (non-feed). How would you get the transaction id and change it when you
>> re-run?
>> 
>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hu...@gmail.com> wrote:
>> 
>>> For atomic transactions, the change was merged yesterday. For entity
>> level
>>> transactions, it should be a very small change.
>>> 
>>> Cheers,
>>> Murtadha
>>> 
>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <ba...@gmail.com>
>>> wrote:
>>>> 
>>>> I understand that is not the case right now but what you're working on?
>>>> 
>>>> Cheers,
>>>> Abdullah.
>>>> 
>>>> 
>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <hu...@gmail.com>
>>> wrote:
>>>>> 
>>>>> A transaction context can register multiple primary indexes.
>>>>> Since each entity commit log contains the dataset id, you can
>> decrement
>>> the active operations on
>>>>> the operation tracker associated with that dataset id.
>>>>> 
>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <ba...@gmail.com>
>> wrote:
>>>>> 
>>>>>  Can you illustrate how a deadlock can happen? I am anxious to know.
>>>>>  Moreover, the reason for the multiple transaction ids in feeds is
>> not
>>> simply because we compile them differently.
>>>>> 
>>>>>  How would a commit operator know which dataset active operation
>>> counter to decrement if they share the same id for example?
>>>>> 
>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu> wrote:
>>>>>> 
>>>>>> Yes. That deadlock could happen. Currently, we have one-to-one
>>> mappings for
>>>>>> the jobs and transactions, except for the feeds.
>>>>>> 
>>>>>> @Abdullah, after some digging into the code, I think probably we can
>>> use a
>>>>>> single transaction id for the job which feeds multiple datasets? See
>>> if I
>>>>>> can convince you. :)
>>>>>> 
>>>>>> The reason we have multiple transaction ids in feeds is that we
>> compile
>>>>>> each connection job separately and combine them into a single feed
>>> job. A
>>>>>> new transaction id is created and assigned to each connection job,
>>> thus for
>>>>>> the combined job, we have to handle the different transactions as
>> they
>>>>>> are embedded in the connection job specifications. But, what if we
>>> create a
>>>>>> single transaction id for the combined job? That transaction id will
>> be
>>>>>> embedded into each connection so they can write logs freely, but the
>>>>>> transaction will be started and committed only once as there is only
>>> one
>>>>>> feed job. In this way, we won't need multiTransactionJobletEventLis
>>> tener
>>>>>> and the transaction id can be removed from the job specification
>>> easily as
>>>>>> well (for Steven's change).
>>>>>> 
>>>>>> Best,
>>>>>> Xikui
>>>>>> 
>>>>>> 
>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dt...@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>> I worry about deadlocks.  The waits for graph may not understand
>> that
>>>>>>> making t1 wait will also make t2 wait since they may share a thread
>> -
>>>>>>> right?  Or do we have jobs and transactions separately represented
>>> there
>>>>>>> now?
>>>>>>> 
>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <ba...@gmail.com>
>>> wrote:
>>>>>>>> 
>>>>>>>> We are using multiple transactions in a single job in case of feed
>>> and I
>>>>>>>> think that this is the correct way.
>>>>>>>> Having a single job for a feed that feeds into multiple datasets
>> is a
>>>>>>> good
>>>>>>>> thing since job resources/feed resources are consolidated.
>>>>>>>> 
>>>>>>>> Here are some points:
>>>>>>>> - We can't use the same transaction id to feed multiple datasets.
>> The
>>>>>>> only
>>>>>>>> other option is to have multiple jobs each feeding a different
>>> dataset.
>>>>>>>> - Having multiple jobs (in addition to the extra resources used,
>>> memory
>>>>>>>> and CPU) would then forces us to either read data from external
>>> sources
>>>>>>>> multiple times, parse records multiple times, etc
>>>>>>>> or having to have a synchronization between the different jobs and
>>> the
>>>>>>>> feed source within asterixdb. IMO, this is far more complicated
>> than
>>>>>>> having
>>>>>>>> multiple transactions within a single job and the cost far outweigh
>>> the
>>>>>>>> benefits.
>>>>>>>> 
>>>>>>>> P.S,
>>>>>>>> We are also using this for bucket connections in Couchbase
>> Analytics.
>>>>>>>> 
>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org>
>>> wrote:
>>>>>>>>> 
>>>>>>>>> If there are a number of issue with supporting multiple
>> transaction
>>> ids
>>>>>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
>>>>>>>>> Also, code that’s not being used has a tendency to "rot" and so I
>>> think
>>>>>>>>> that it’s usefulness might be limited by the time we’d find a use
>>> for
>>>>>>>>> this functionality.
>>>>>>>>> 
>>>>>>>>> My 2c,
>>>>>>>>> Till
>>>>>>>>> 
>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>>>>>>>> 
>>>>>>>>>> I'm separating the connections into different jobs in some of my
>>>>>>>>>> experiments... but that was intended to be used for the
>>> experimental
>>>>>>>>>> settings (i.e., not for master now)...
>>>>>>>>>> 
>>>>>>>>>> I think the interesting question here is whether we want to allow
>>> one
>>>>>>>>>> Hyracks job to carry multiple transactions. I personally think
>> that
>>>>>>>> should
>>>>>>>>>> be allowed as the transaction and job are two separate concepts,
>>> but I
>>>>>>>>>> couldn't find such use cases other than the feeds. Does anyone
>>> have a
>>>>>>>> good
>>>>>>>>>> example on this?
>>>>>>>>>> 
>>>>>>>>>> Another question is, if we do allow multiple transactions in a
>>> single
>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the
>> correct
>>> TXN
>>>>>>>> id
>>>>>>>>>> without having that embedded as part of the job specification.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Xikui
>>>>>>>>>> 
>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>>>>>>> bamousaa@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I am curious as to how feed will work without this?
>>>>>>>>>>> 
>>>>>>>>>>> ~Abdullah.
>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu>
>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory,
>>> which
>>>>>>>>>>> allows
>>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions
>>> together.
>>>>>>>>>>>> 
>>>>>>>>>>>> This class is only used by feeds, and feeds are in process of
>>>>>>>> changing to
>>>>>>>>>>>> no longer need this feature. As part of the work in
>> pre-deploying
>>>>>>> job
>>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've been
>>>>>>> working
>>>>>>>> on
>>>>>>>>>>>> removing the transaction id from the job specifications, as we
>>> use a
>>>>>>>> new
>>>>>>>>>>>> transaction for each invocation of a deployed job.
>>>>>>>>>>>> 
>>>>>>>>>>>> There is currently no clear way to remove the transaction id
>> from
>>>>>>> the
>>>>>>>> job
>>>>>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
>>>>>>>> tenerFactory.
>>>>>>>>>>>> 
>>>>>>>>>>>> The question for the group is, do we see a need to maintain
>> this
>>>>>>> class
>>>>>>>>>>> that
>>>>>>>>>>>> will no longer be used by any current code? Or, an other words,
>>> is
>>>>>>>> there
>>>>>>>>>>> a
>>>>>>>>>>>> strong possibility that in the future we will want multiple
>>>>>>>> transactions
>>>>>>>>>>> to
>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth figuring
>> out
>>>>>>> how
>>>>>>>> to
>>>>>>>>>>>> maintain this class?
>>>>>>>>>>>> 
>>>>>>>>>>>> Steven
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: MultiTransactionJobletEventListenerFactory

Posted by Steven Jacobs <sj...@ucr.edu>.

Maybe it would be good to have a meeting about this with all interested
parties?

I can be on-campus at UCI on Tuesday if that would be a good day to meet.

Steven

On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <ba...@gmail.com>
wrote:

> Also, was wondering how would you do the same for a single dataset
> (non-feed). How would you get the transaction id and change it when you
> re-run?
>
> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hu...@gmail.com> wrote:
>
> > For atomic transactions, the change was merged yesterday. For entity
> level
> > transactions, it should be a very small change.
> >
> > Cheers,
> > Murtadha
> >
> > > On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <ba...@gmail.com>
> > wrote:
> > >
> > > I understand that is not the case right now but what you're working on?
> > >
> > > Cheers,
> > > Abdullah.
> > >
> > >
> > >> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <hu...@gmail.com>
> > wrote:
> > >>
> > >> A transaction context can register multiple primary indexes.
> > >> Since each entity commit log contains the dataset id, you can
> decrement
> > the active operations on
> > >> the operation tracker associated with that dataset id.
> > >>
> > >> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <ba...@gmail.com>
> wrote:
> > >>
> > >>   Can you illustrate how a deadlock can happen? I am anxious to know.
> > >>   Moreover, the reason for the multiple transaction ids in feeds is
> not
> > simply because we compile them differently.
> > >>
> > >>   How would a commit operator know which dataset active operation
> > counter to decrement if they share the same id for example?
> > >>
> > >>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu> wrote:
> > >>>
> > >>> Yes. That deadlock could happen. Currently, we have one-to-one
> > mappings for
> > >>> the jobs and transactions, except for the feeds.
> > >>>
> > >>> @Abdullah, after some digging into the code, I think probably we can
> > use a
> > >>> single transaction id for the job which feeds multiple datasets? See
> > if I
> > >>> can convince you. :)
> > >>>
> > >>> The reason we have multiple transaction ids in feeds is that we
> compile
> > >>> each connection job separately and combine them into a single feed
> > job. A
> > >>> new transaction id is created and assigned to each connection job,
> > thus for
> > >>> the combined job, we have to handle the different transactions as
> they
> > >>> are embedded in the connection job specifications. But, what if we
> > create a
> > >>> single transaction id for the combined job? That transaction id will
> be
> > >>> embedded into each connection so they can write logs freely, but the
> > >>> transaction will be started and committed only once as there is only
> > one
> > >>> feed job. In this way, we won't need multiTransactionJobletEventLis
> > tener
> > >>> and the transaction id can be removed from the job specification
> > easily as
> > >>> well (for Steven's change).
> > >>>
> > >>> Best,
> > >>> Xikui
> > >>>
> > >>>
> > >>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dt...@gmail.com>
> > wrote:
> > >>>>
> > >>>> I worry about deadlocks.  The waits for graph may not understand
> that
> > >>>> making t1 wait will also make t2 wait since they may share a thread
> -
> > >>>> right?  Or do we have jobs and transactions separately represented
> > there
> > >>>> now?
> > >>>>
> > >>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <ba...@gmail.com>
> > wrote:
> > >>>>>
> > >>>>> We are using multiple transactions in a single job in case of feed
> > and I
> > >>>>> think that this is the correct way.
> > >>>>> Having a single job for a feed that feeds into multiple datasets
> is a
> > >>>> good
> > >>>>> thing since job resources/feed resources are consolidated.
> > >>>>>
> > >>>>> Here are some points:
> > >>>>> - We can't use the same transaction id to feed multiple datasets.
> The
> > >>>> only
> > >>>>> other option is to have multiple jobs each feeding a different
> > dataset.
> > >>>>> - Having multiple jobs (in addition to the extra resources used,
> > memory
> > >>>>> and CPU) would then forces us to either read data from external
> > sources
> > >>>>> multiple times, parse records multiple times, etc
> > >>>>> or having to have a synchronization between the different jobs and
> > the
> > >>>>> feed source within asterixdb. IMO, this is far more complicated
> than
> > >>>> having
> > >>>>> multiple transactions within a single job and the cost far outweigh
> > the
> > >>>>> benefits.
> > >>>>>
> > >>>>> P.S,
> > >>>>> We are also using this for bucket connections in Couchbase
> Analytics.
> > >>>>>
> > >>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org>
> > wrote:
> > >>>>>>
> > >>>>>> If there are a number of issue with supporting multiple
> transaction
> > ids
> > >>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
> > >>>>>> Also, code that’s not being used has a tendency to "rot" and so I
> > think
> > >>>>>> that it’s usefulness might be limited by the time we’d find a use
> > for
> > >>>>>> this functionality.
> > >>>>>>
> > >>>>>> My 2c,
> > >>>>>> Till
> > >>>>>>
> > >>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> > >>>>>>>
> > >>>>>>> I'm separating the connections into different jobs in some of my
> > >>>>>>> experiments... but that was intended to be used for the
> > experimental
> > >>>>>>> settings (i.e., not for master now)...
> > >>>>>>>
> > >>>>>>> I think the interesting question here is whether we want to allow
> > one
> > >>>>>>> Hyracks job to carry multiple transactions. I personally think
> that
> > >>>>> should
> > >>>>>>> be allowed as the transaction and job are two separate concepts,
> > but I
> > >>>>>>> couldn't find such use cases other than the feeds. Does anyone
> > have a
> > >>>>> good
> > >>>>>>> example on this?
> > >>>>>>>
> > >>>>>>> Another question is, if we do allow multiple transactions in a
> > single
> > >>>>>>> Hyracks job, how do we enable commit runtime to obtain the
> correct
> > TXN
> > >>>>> id
> > >>>>>>> without having that embedded as part of the job specification.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Xikui
> > >>>>>>>
> > >>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
> > >>>> bamousaa@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> I am curious as to how feed will work without this?
> > >>>>>>>>
> > >>>>>>>> ~Abdullah.
> > >>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu>
> > >>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Hi all,
> > >>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory,
> > which
> > >>>>>>>> allows
> > >>>>>>>>> for one Hyracks job to run multiple Asterix transactions
> > together.
> > >>>>>>>>>
> > >>>>>>>>> This class is only used by feeds, and feeds are in process of
> > >>>>> changing to
> > >>>>>>>>> no longer need this feature. As part of the work in
> pre-deploying
> > >>>> job
> > >>>>>>>>> specifications to be used by multiple hyracks jobs, I've been
> > >>>> working
> > >>>>> on
> > >>>>>>>>> removing the transaction id from the job specifications, as we
> > use a
> > >>>>> new
> > >>>>>>>>> transaction for each invocation of a deployed job.
> > >>>>>>>>>
> > >>>>>>>>> There is currently no clear way to remove the transaction id
> from
> > >>>> the
> > >>>>> job
> > >>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
> > >>>>> tenerFactory.
> > >>>>>>>>>
> > >>>>>>>>> The question for the group is, do we see a need to maintain
> this
> > >>>> class
> > >>>>>>>> that
> > >>>>>>>>> will no longer be used by any current code? Or, an other words,
> > is
> > >>>>> there
> > >>>>>>>> a
> > >>>>>>>>> strong possibility that in the future we will want multiple
> > >>>>> transactions
> > >>>>>>>> to
> > >>>>>>>>> share a single Hyracks job, meaning that it is worth figuring
> out
> > >>>> how
> > >>>>> to
> > >>>>>>>>> maintain this class?
> > >>>>>>>>>
> > >>>>>>>>> Steven
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> > >>
> > >>
> > >>
> > >
> >
>

Re: MultiTransactionJobletEventListenerFactory

Posted by abdullah alamoudi <ba...@gmail.com>.

Also, was wondering how would you do the same for a single dataset
(non-feed). How would you get the transaction id and change it when you
re-run?

On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <hu...@gmail.com> wrote:

> For atomic transactions, the change was merged yesterday. For entity level
> transactions, it should be a very small change.
>
> Cheers,
> Murtadha
>
> > On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <ba...@gmail.com>
> wrote:
> >
> > I understand that is not the case right now but what you're working on?
> >
> > Cheers,
> > Abdullah.
> >
> >
> >> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <hu...@gmail.com>
> wrote:
> >>
> >> A transaction context can register multiple primary indexes.
> >> Since each entity commit log contains the dataset id, you can decrement
> the active operations on
> >> the operation tracker associated with that dataset id.
> >>
> >> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <ba...@gmail.com> wrote:
> >>
> >>   Can you illustrate how a deadlock can happen? I am anxious to know.
> >>   Moreover, the reason for the multiple transaction ids in feeds is not
> simply because we compile them differently.
> >>
> >>   How would a commit operator know which dataset active operation
> counter to decrement if they share the same id for example?
> >>
> >>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu> wrote:
> >>>
> >>> Yes. That deadlock could happen. Currently, we have one-to-one
> mappings for
> >>> the jobs and transactions, except for the feeds.
> >>>
> >>> @Abdullah, after some digging into the code, I think probably we can
> use a
> >>> single transaction id for the job which feeds multiple datasets? See
> if I
> >>> can convince you. :)
> >>>
> >>> The reason we have multiple transaction ids in feeds is that we compile
> >>> each connection job separately and combine them into a single feed
> job. A
> >>> new transaction id is created and assigned to each connection job,
> thus for
> >>> the combined job, we have to handle the different transactions as they
> >>> are embedded in the connection job specifications. But, what if we
> create a
> >>> single transaction id for the combined job? That transaction id will be
> >>> embedded into each connection so they can write logs freely, but the
> >>> transaction will be started and committed only once as there is only
> one
> >>> feed job. In this way, we won't need multiTransactionJobletEventLis
> tener
> >>> and the transaction id can be removed from the job specification
> easily as
> >>> well (for Steven's change).
> >>>
> >>> Best,
> >>> Xikui
> >>>
> >>>
> >>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dt...@gmail.com>
> wrote:
> >>>>
> >>>> I worry about deadlocks.  The waits for graph may not understand that
> >>>> making t1 wait will also make t2 wait since they may share a thread -
> >>>> right?  Or do we have jobs and transactions separately represented
> there
> >>>> now?
> >>>>
> >>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <ba...@gmail.com>
> wrote:
> >>>>>
> >>>>> We are using multiple transactions in a single job in case of feed
> and I
> >>>>> think that this is the correct way.
> >>>>> Having a single job for a feed that feeds into multiple datasets is a
> >>>> good
> >>>>> thing since job resources/feed resources are consolidated.
> >>>>>
> >>>>> Here are some points:
> >>>>> - We can't use the same transaction id to feed multiple datasets. The
> >>>> only
> >>>>> other option is to have multiple jobs each feeding a different
> dataset.
> >>>>> - Having multiple jobs (in addition to the extra resources used,
> memory
> >>>>> and CPU) would then forces us to either read data from external
> sources
> >>>>> multiple times, parse records multiple times, etc
> >>>>> or having to have a synchronization between the different jobs and
> the
> >>>>> feed source within asterixdb. IMO, this is far more complicated than
> >>>> having
> >>>>> multiple transactions within a single job and the cost far outweigh
> the
> >>>>> benefits.
> >>>>>
> >>>>> P.S,
> >>>>> We are also using this for bucket connections in Couchbase Analytics.
> >>>>>
> >>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org>
> wrote:
> >>>>>>
> >>>>>> If there are a number of issue with supporting multiple transaction
> ids
> >>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
> >>>>>> Also, code that’s not being used has a tendency to "rot" and so I
> think
> >>>>>> that it’s usefulness might be limited by the time we’d find a use
> for
> >>>>>> this functionality.
> >>>>>>
> >>>>>> My 2c,
> >>>>>> Till
> >>>>>>
> >>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> >>>>>>>
> >>>>>>> I'm separating the connections into different jobs in some of my
> >>>>>>> experiments... but that was intended to be used for the
> experimental
> >>>>>>> settings (i.e., not for master now)...
> >>>>>>>
> >>>>>>> I think the interesting question here is whether we want to allow
> one
> >>>>>>> Hyracks job to carry multiple transactions. I personally think that
> >>>>> should
> >>>>>>> be allowed as the transaction and job are two separate concepts,
> but I
> >>>>>>> couldn't find such use cases other than the feeds. Does anyone
> have a
> >>>>> good
> >>>>>>> example on this?
> >>>>>>>
> >>>>>>> Another question is, if we do allow multiple transactions in a
> single
> >>>>>>> Hyracks job, how do we enable commit runtime to obtain the correct
> TXN
> >>>>> id
> >>>>>>> without having that embedded as part of the job specification.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Xikui
> >>>>>>>
> >>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
> >>>> bamousaa@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I am curious as to how feed will work without this?
> >>>>>>>>
> >>>>>>>> ~Abdullah.
> >>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu>
> >>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory,
> which
> >>>>>>>> allows
> >>>>>>>>> for one Hyracks job to run multiple Asterix transactions
> together.
> >>>>>>>>>
> >>>>>>>>> This class is only used by feeds, and feeds are in process of
> >>>>> changing to
> >>>>>>>>> no longer need this feature. As part of the work in pre-deploying
> >>>> job
> >>>>>>>>> specifications to be used by multiple hyracks jobs, I've been
> >>>> working
> >>>>> on
> >>>>>>>>> removing the transaction id from the job specifications, as we
> use a
> >>>>> new
> >>>>>>>>> transaction for each invocation of a deployed job.
> >>>>>>>>>
> >>>>>>>>> There is currently no clear way to remove the transaction id from
> >>>> the
> >>>>> job
> >>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
> >>>>> tenerFactory.
> >>>>>>>>>
> >>>>>>>>> The question for the group is, do we see a need to maintain this
> >>>> class
> >>>>>>>> that
> >>>>>>>>> will no longer be used by any current code? Or, an other words,
> is
> >>>>> there
> >>>>>>>> a
> >>>>>>>>> strong possibility that in the future we will want multiple
> >>>>> transactions
> >>>>>>>> to
> >>>>>>>>> share a single Hyracks job, meaning that it is worth figuring out
> >>>> how
> >>>>> to
> >>>>>>>>> maintain this class?
> >>>>>>>>>
> >>>>>>>>> Steven
> >>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >>
> >>
> >>
> >
>

Re: MultiTransactionJobletEventListenerFactory

Posted by Murtadha Hubail <hu...@gmail.com>.

For atomic transactions, the change was merged yesterday. For entity level transactions, it should be a very small change.

Cheers,
Murtadha

> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <ba...@gmail.com> wrote:
> 
> I understand that is not the case right now but what you're working on?
> 
> Cheers,
> Abdullah.
> 
> 
>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <hu...@gmail.com> wrote:
>> 
>> A transaction context can register multiple primary indexes.
>> Since each entity commit log contains the dataset id, you can decrement the active operations on 
>> the operation tracker associated with that dataset id.
>> 
>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <ba...@gmail.com> wrote:
>> 
>>   Can you illustrate how a deadlock can happen? I am anxious to know.
>>   Moreover, the reason for the multiple transaction ids in feeds is not simply because we compile them differently.
>> 
>>   How would a commit operator know which dataset active operation counter to decrement if they share the same id for example?
>> 
>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu> wrote:
>>> 
>>> Yes. That deadlock could happen. Currently, we have one-to-one mappings for
>>> the jobs and transactions, except for the feeds.
>>> 
>>> @Abdullah, after some digging into the code, I think probably we can use a
>>> single transaction id for the job which feeds multiple datasets? See if I
>>> can convince you. :)
>>> 
>>> The reason we have multiple transaction ids in feeds is that we compile
>>> each connection job separately and combine them into a single feed job. A
>>> new transaction id is created and assigned to each connection job, thus for
>>> the combined job, we have to handle the different transactions as they
>>> are embedded in the connection job specifications. But, what if we create a
>>> single transaction id for the combined job? That transaction id will be
>>> embedded into each connection so they can write logs freely, but the
>>> transaction will be started and committed only once as there is only one
>>> feed job. In this way, we won't need multiTransactionJobletEventListener
>>> and the transaction id can be removed from the job specification easily as
>>> well (for Steven's change).
>>> 
>>> Best,
>>> Xikui
>>> 
>>> 
>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dt...@gmail.com> wrote:
>>>> 
>>>> I worry about deadlocks.  The waits for graph may not understand that
>>>> making t1 wait will also make t2 wait since they may share a thread -
>>>> right?  Or do we have jobs and transactions separately represented there
>>>> now?
>>>> 
>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <ba...@gmail.com> wrote:
>>>>> 
>>>>> We are using multiple transactions in a single job in case of feed and I
>>>>> think that this is the correct way.
>>>>> Having a single job for a feed that feeds into multiple datasets is a
>>>> good
>>>>> thing since job resources/feed resources are consolidated.
>>>>> 
>>>>> Here are some points:
>>>>> - We can't use the same transaction id to feed multiple datasets. The
>>>> only
>>>>> other option is to have multiple jobs each feeding a different dataset.
>>>>> - Having multiple jobs (in addition to the extra resources used, memory
>>>>> and CPU) would then forces us to either read data from external sources
>>>>> multiple times, parse records multiple times, etc
>>>>> or having to have a synchronization between the different jobs and the
>>>>> feed source within asterixdb. IMO, this is far more complicated than
>>>> having
>>>>> multiple transactions within a single job and the cost far outweigh the
>>>>> benefits.
>>>>> 
>>>>> P.S,
>>>>> We are also using this for bucket connections in Couchbase Analytics.
>>>>> 
>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org> wrote:
>>>>>> 
>>>>>> If there are a number of issue with supporting multiple transaction ids
>>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
>>>>>> Also, code that’s not being used has a tendency to "rot" and so I think
>>>>>> that it’s usefulness might be limited by the time we’d find a use for
>>>>>> this functionality.
>>>>>> 
>>>>>> My 2c,
>>>>>> Till
>>>>>> 
>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>>>>> 
>>>>>>> I'm separating the connections into different jobs in some of my
>>>>>>> experiments... but that was intended to be used for the experimental
>>>>>>> settings (i.e., not for master now)...
>>>>>>> 
>>>>>>> I think the interesting question here is whether we want to allow one
>>>>>>> Hyracks job to carry multiple transactions. I personally think that
>>>>> should
>>>>>>> be allowed as the transaction and job are two separate concepts, but I
>>>>>>> couldn't find such use cases other than the feeds. Does anyone have a
>>>>> good
>>>>>>> example on this?
>>>>>>> 
>>>>>>> Another question is, if we do allow multiple transactions in a single
>>>>>>> Hyracks job, how do we enable commit runtime to obtain the correct TXN
>>>>> id
>>>>>>> without having that embedded as part of the job specification.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Xikui
>>>>>>> 
>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>>>> bamousaa@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I am curious as to how feed will work without this?
>>>>>>>> 
>>>>>>>> ~Abdullah.
>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory, which
>>>>>>>> allows
>>>>>>>>> for one Hyracks job to run multiple Asterix transactions together.
>>>>>>>>> 
>>>>>>>>> This class is only used by feeds, and feeds are in process of
>>>>> changing to
>>>>>>>>> no longer need this feature. As part of the work in pre-deploying
>>>> job
>>>>>>>>> specifications to be used by multiple hyracks jobs, I've been
>>>> working
>>>>> on
>>>>>>>>> removing the transaction id from the job specifications, as we use a
>>>>> new
>>>>>>>>> transaction for each invocation of a deployed job.
>>>>>>>>> 
>>>>>>>>> There is currently no clear way to remove the transaction id from
>>>> the
>>>>> job
>>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
>>>>> tenerFactory.
>>>>>>>>> 
>>>>>>>>> The question for the group is, do we see a need to maintain this
>>>> class
>>>>>>>> that
>>>>>>>>> will no longer be used by any current code? Or, an other words, is
>>>>> there
>>>>>>>> a
>>>>>>>>> strong possibility that in the future we will want multiple
>>>>> transactions
>>>>>>>> to
>>>>>>>>> share a single Hyracks job, meaning that it is worth figuring out
>>>> how
>>>>> to
>>>>>>>>> maintain this class?
>>>>>>>>> 
>>>>>>>>> Steven
>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 
>> 
>> 
>

Re: MultiTransactionJobletEventListenerFactory

Posted by abdullah alamoudi <ba...@gmail.com>.

I understand that is not the case right now but what you're working on?

Cheers,
Abdullah.


> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <hu...@gmail.com> wrote:
> 
> A transaction context can register multiple primary indexes.
> Since each entity commit log contains the dataset id, you can decrement the active operations on 
> the operation tracker associated with that dataset id.
> 
> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <ba...@gmail.com> wrote:
> 
>    Can you illustrate how a deadlock can happen? I am anxious to know.
>    Moreover, the reason for the multiple transaction ids in feeds is not simply because we compile them differently.
> 
>    How would a commit operator know which dataset active operation counter to decrement if they share the same id for example?
> 
>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu> wrote:
>> 
>> Yes. That deadlock could happen. Currently, we have one-to-one mappings for
>> the jobs and transactions, except for the feeds.
>> 
>> @Abdullah, after some digging into the code, I think probably we can use a
>> single transaction id for the job which feeds multiple datasets? See if I
>> can convince you. :)
>> 
>> The reason we have multiple transaction ids in feeds is that we compile
>> each connection job separately and combine them into a single feed job. A
>> new transaction id is created and assigned to each connection job, thus for
>> the combined job, we have to handle the different transactions as they
>> are embedded in the connection job specifications. But, what if we create a
>> single transaction id for the combined job? That transaction id will be
>> embedded into each connection so they can write logs freely, but the
>> transaction will be started and committed only once as there is only one
>> feed job. In this way, we won't need multiTransactionJobletEventListener
>> and the transaction id can be removed from the job specification easily as
>> well (for Steven's change).
>> 
>> Best,
>> Xikui
>> 
>> 
>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dt...@gmail.com> wrote:
>> 
>>> I worry about deadlocks.  The waits for graph may not understand that
>>> making t1 wait will also make t2 wait since they may share a thread -
>>> right?  Or do we have jobs and transactions separately represented there
>>> now?
>>> 
>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <ba...@gmail.com> wrote:
>>> 
>>>> We are using multiple transactions in a single job in case of feed and I
>>>> think that this is the correct way.
>>>> Having a single job for a feed that feeds into multiple datasets is a
>>> good
>>>> thing since job resources/feed resources are consolidated.
>>>> 
>>>> Here are some points:
>>>> - We can't use the same transaction id to feed multiple datasets. The
>>> only
>>>> other option is to have multiple jobs each feeding a different dataset.
>>>> - Having multiple jobs (in addition to the extra resources used, memory
>>>> and CPU) would then forces us to either read data from external sources
>>>> multiple times, parse records multiple times, etc
>>>> or having to have a synchronization between the different jobs and the
>>>> feed source within asterixdb. IMO, this is far more complicated than
>>> having
>>>> multiple transactions within a single job and the cost far outweigh the
>>>> benefits.
>>>> 
>>>> P.S,
>>>> We are also using this for bucket connections in Couchbase Analytics.
>>>> 
>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org> wrote:
>>>>> 
>>>>> If there are a number of issue with supporting multiple transaction ids
>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
>>>>> Also, code that’s not being used has a tendency to "rot" and so I think
>>>>> that it’s usefulness might be limited by the time we’d find a use for
>>>>> this functionality.
>>>>> 
>>>>> My 2c,
>>>>> Till
>>>>> 
>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>>> 
>>>>>> I'm separating the connections into different jobs in some of my
>>>>>> experiments... but that was intended to be used for the experimental
>>>>>> settings (i.e., not for master now)...
>>>>>> 
>>>>>> I think the interesting question here is whether we want to allow one
>>>>>> Hyracks job to carry multiple transactions. I personally think that
>>>> should
>>>>>> be allowed as the transaction and job are two separate concepts, but I
>>>>>> couldn't find such use cases other than the feeds. Does anyone have a
>>>> good
>>>>>> example on this?
>>>>>> 
>>>>>> Another question is, if we do allow multiple transactions in a single
>>>>>> Hyracks job, how do we enable commit runtime to obtain the correct TXN
>>>> id
>>>>>> without having that embedded as part of the job specification.
>>>>>> 
>>>>>> Best,
>>>>>> Xikui
>>>>>> 
>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>>> bamousaa@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> I am curious as to how feed will work without this?
>>>>>>> 
>>>>>>> ~Abdullah.
>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu>
>>> wrote:
>>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory, which
>>>>>>> allows
>>>>>>>> for one Hyracks job to run multiple Asterix transactions together.
>>>>>>>> 
>>>>>>>> This class is only used by feeds, and feeds are in process of
>>>> changing to
>>>>>>>> no longer need this feature. As part of the work in pre-deploying
>>> job
>>>>>>>> specifications to be used by multiple hyracks jobs, I've been
>>> working
>>>> on
>>>>>>>> removing the transaction id from the job specifications, as we use a
>>>> new
>>>>>>>> transaction for each invocation of a deployed job.
>>>>>>>> 
>>>>>>>> There is currently no clear way to remove the transaction id from
>>> the
>>>> job
>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
>>>> tenerFactory.
>>>>>>>> 
>>>>>>>> The question for the group is, do we see a need to maintain this
>>> class
>>>>>>> that
>>>>>>>> will no longer be used by any current code? Or, an other words, is
>>>> there
>>>>>>> a
>>>>>>>> strong possibility that in the future we will want multiple
>>>> transactions
>>>>>>> to
>>>>>>>> share a single Hyracks job, meaning that it is worth figuring out
>>> how
>>>> to
>>>>>>>> maintain this class?
>>>>>>>> 
>>>>>>>> Steven
>>>>>>> 
>>>>>>> 
>>>> 
>>>> 
>>> 
> 
> 
> 
>

Re: MultiTransactionJobletEventListenerFactory

Posted by Murtadha Hubail <hu...@gmail.com>.

A transaction context can register multiple primary indexes.
Since each entity commit log contains the dataset id, you can decrement the active operations on 
the operation tracker associated with that dataset id.

On 17/11/2017, 5:52 PM, "abdullah alamoudi" <ba...@gmail.com> wrote:

    Can you illustrate how a deadlock can happen? I am anxious to know.
    Moreover, the reason for the multiple transaction ids in feeds is not simply because we compile them differently.
    
    How would a commit operator know which dataset active operation counter to decrement if they share the same id for example?
    
    > On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu> wrote:
    > 
    > Yes. That deadlock could happen. Currently, we have one-to-one mappings for
    > the jobs and transactions, except for the feeds.
    > 
    > @Abdullah, after some digging into the code, I think probably we can use a
    > single transaction id for the job which feeds multiple datasets? See if I
    > can convince you. :)
    > 
    > The reason we have multiple transaction ids in feeds is that we compile
    > each connection job separately and combine them into a single feed job. A
    > new transaction id is created and assigned to each connection job, thus for
    > the combined job, we have to handle the different transactions as they
    > are embedded in the connection job specifications. But, what if we create a
    > single transaction id for the combined job? That transaction id will be
    > embedded into each connection so they can write logs freely, but the
    > transaction will be started and committed only once as there is only one
    > feed job. In this way, we won't need multiTransactionJobletEventListener
    > and the transaction id can be removed from the job specification easily as
    > well (for Steven's change).
    > 
    > Best,
    > Xikui
    > 
    > 
    > On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dt...@gmail.com> wrote:
    > 
    >> I worry about deadlocks.  The waits for graph may not understand that
    >> making t1 wait will also make t2 wait since they may share a thread -
    >> right?  Or do we have jobs and transactions separately represented there
    >> now?
    >> 
    >> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <ba...@gmail.com> wrote:
    >> 
    >>> We are using multiple transactions in a single job in case of feed and I
    >>> think that this is the correct way.
    >>> Having a single job for a feed that feeds into multiple datasets is a
    >> good
    >>> thing since job resources/feed resources are consolidated.
    >>> 
    >>> Here are some points:
    >>> - We can't use the same transaction id to feed multiple datasets. The
    >> only
    >>> other option is to have multiple jobs each feeding a different dataset.
    >>> - Having multiple jobs (in addition to the extra resources used, memory
    >>> and CPU) would then forces us to either read data from external sources
    >>> multiple times, parse records multiple times, etc
    >>>  or having to have a synchronization between the different jobs and the
    >>> feed source within asterixdb. IMO, this is far more complicated than
    >> having
    >>> multiple transactions within a single job and the cost far outweigh the
    >>> benefits.
    >>> 
    >>> P.S,
    >>> We are also using this for bucket connections in Couchbase Analytics.
    >>> 
    >>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org> wrote:
    >>>> 
    >>>> If there are a number of issue with supporting multiple transaction ids
    >>>> and no clear benefits/use-cases, I’d vote for simplification :)
    >>>> Also, code that’s not being used has a tendency to "rot" and so I think
    >>>> that it’s usefulness might be limited by the time we’d find a use for
    >>>> this functionality.
    >>>> 
    >>>> My 2c,
    >>>> Till
    >>>> 
    >>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
    >>>> 
    >>>>> I'm separating the connections into different jobs in some of my
    >>>>> experiments... but that was intended to be used for the experimental
    >>>>> settings (i.e., not for master now)...
    >>>>> 
    >>>>> I think the interesting question here is whether we want to allow one
    >>>>> Hyracks job to carry multiple transactions. I personally think that
    >>> should
    >>>>> be allowed as the transaction and job are two separate concepts, but I
    >>>>> couldn't find such use cases other than the feeds. Does anyone have a
    >>> good
    >>>>> example on this?
    >>>>> 
    >>>>> Another question is, if we do allow multiple transactions in a single
    >>>>> Hyracks job, how do we enable commit runtime to obtain the correct TXN
    >>> id
    >>>>> without having that embedded as part of the job specification.
    >>>>> 
    >>>>> Best,
    >>>>> Xikui
    >>>>> 
    >>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
    >> bamousaa@gmail.com>
    >>>>> wrote:
    >>>>> 
    >>>>>> I am curious as to how feed will work without this?
    >>>>>> 
    >>>>>> ~Abdullah.
    >>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu>
    >> wrote:
    >>>>>>> 
    >>>>>>> Hi all,
    >>>>>>> We currently have MultiTransactionJobletEventListenerFactory, which
    >>>>>> allows
    >>>>>>> for one Hyracks job to run multiple Asterix transactions together.
    >>>>>>> 
    >>>>>>> This class is only used by feeds, and feeds are in process of
    >>> changing to
    >>>>>>> no longer need this feature. As part of the work in pre-deploying
    >> job
    >>>>>>> specifications to be used by multiple hyracks jobs, I've been
    >> working
    >>> on
    >>>>>>> removing the transaction id from the job specifications, as we use a
    >>> new
    >>>>>>> transaction for each invocation of a deployed job.
    >>>>>>> 
    >>>>>>> There is currently no clear way to remove the transaction id from
    >> the
    >>> job
    >>>>>>> spec and keep the option for MultiTransactionJobletEventLis
    >>> tenerFactory.
    >>>>>>> 
    >>>>>>> The question for the group is, do we see a need to maintain this
    >> class
    >>>>>> that
    >>>>>>> will no longer be used by any current code? Or, an other words, is
    >>> there
    >>>>>> a
    >>>>>>> strong possibility that in the future we will want multiple
    >>> transactions
    >>>>>> to
    >>>>>>> share a single Hyracks job, meaning that it is worth figuring out
    >> how
    >>> to
    >>>>>>> maintain this class?
    >>>>>>> 
    >>>>>>> Steven
    >>>>>> 
    >>>>>> 
    >>> 
    >>> 
    >>

Re: MultiTransactionJobletEventListenerFactory

Posted by abdullah alamoudi <ba...@gmail.com>.

Can you illustrate how a deadlock can happen? I am anxious to know.
Moreover, the reason for the multiple transaction ids in feeds is not simply because we compile them differently.

How would a commit operator know which dataset active operation counter to decrement if they share the same id for example?

> On Nov 16, 2017, at 9:46 PM, Xikui Wang <xi...@uci.edu> wrote:
> 
> Yes. That deadlock could happen. Currently, we have one-to-one mappings for
> the jobs and transactions, except for the feeds.
> 
> @Abdullah, after some digging into the code, I think probably we can use a
> single transaction id for the job which feeds multiple datasets? See if I
> can convince you. :)
> 
> The reason we have multiple transaction ids in feeds is that we compile
> each connection job separately and combine them into a single feed job. A
> new transaction id is created and assigned to each connection job, thus for
> the combined job, we have to handle the different transactions as they
> are embedded in the connection job specifications. But, what if we create a
> single transaction id for the combined job? That transaction id will be
> embedded into each connection so they can write logs freely, but the
> transaction will be started and committed only once as there is only one
> feed job. In this way, we won't need multiTransactionJobletEventListener
> and the transaction id can be removed from the job specification easily as
> well (for Steven's change).
> 
> Best,
> Xikui
> 
> 
> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dt...@gmail.com> wrote:
> 
>> I worry about deadlocks.  The waits for graph may not understand that
>> making t1 wait will also make t2 wait since they may share a thread -
>> right?  Or do we have jobs and transactions separately represented there
>> now?
>> 
>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <ba...@gmail.com> wrote:
>> 
>>> We are using multiple transactions in a single job in case of feed and I
>>> think that this is the correct way.
>>> Having a single job for a feed that feeds into multiple datasets is a
>> good
>>> thing since job resources/feed resources are consolidated.
>>> 
>>> Here are some points:
>>> - We can't use the same transaction id to feed multiple datasets. The
>> only
>>> other option is to have multiple jobs each feeding a different dataset.
>>> - Having multiple jobs (in addition to the extra resources used, memory
>>> and CPU) would then forces us to either read data from external sources
>>> multiple times, parse records multiple times, etc
>>>  or having to have a synchronization between the different jobs and the
>>> feed source within asterixdb. IMO, this is far more complicated than
>> having
>>> multiple transactions within a single job and the cost far outweigh the
>>> benefits.
>>> 
>>> P.S,
>>> We are also using this for bucket connections in Couchbase Analytics.
>>> 
>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org> wrote:
>>>> 
>>>> If there are a number of issue with supporting multiple transaction ids
>>>> and no clear benefits/use-cases, I’d vote for simplification :)
>>>> Also, code that’s not being used has a tendency to "rot" and so I think
>>>> that it’s usefulness might be limited by the time we’d find a use for
>>>> this functionality.
>>>> 
>>>> My 2c,
>>>> Till
>>>> 
>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>> 
>>>>> I'm separating the connections into different jobs in some of my
>>>>> experiments... but that was intended to be used for the experimental
>>>>> settings (i.e., not for master now)...
>>>>> 
>>>>> I think the interesting question here is whether we want to allow one
>>>>> Hyracks job to carry multiple transactions. I personally think that
>>> should
>>>>> be allowed as the transaction and job are two separate concepts, but I
>>>>> couldn't find such use cases other than the feeds. Does anyone have a
>>> good
>>>>> example on this?
>>>>> 
>>>>> Another question is, if we do allow multiple transactions in a single
>>>>> Hyracks job, how do we enable commit runtime to obtain the correct TXN
>>> id
>>>>> without having that embedded as part of the job specification.
>>>>> 
>>>>> Best,
>>>>> Xikui
>>>>> 
>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>> bamousaa@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> I am curious as to how feed will work without this?
>>>>>> 
>>>>>> ~Abdullah.
>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu>
>> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> We currently have MultiTransactionJobletEventListenerFactory, which
>>>>>> allows
>>>>>>> for one Hyracks job to run multiple Asterix transactions together.
>>>>>>> 
>>>>>>> This class is only used by feeds, and feeds are in process of
>>> changing to
>>>>>>> no longer need this feature. As part of the work in pre-deploying
>> job
>>>>>>> specifications to be used by multiple hyracks jobs, I've been
>> working
>>> on
>>>>>>> removing the transaction id from the job specifications, as we use a
>>> new
>>>>>>> transaction for each invocation of a deployed job.
>>>>>>> 
>>>>>>> There is currently no clear way to remove the transaction id from
>> the
>>> job
>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
>>> tenerFactory.
>>>>>>> 
>>>>>>> The question for the group is, do we see a need to maintain this
>> class
>>>>>> that
>>>>>>> will no longer be used by any current code? Or, an other words, is
>>> there
>>>>>> a
>>>>>>> strong possibility that in the future we will want multiple
>>> transactions
>>>>>> to
>>>>>>> share a single Hyracks job, meaning that it is worth figuring out
>> how
>>> to
>>>>>>> maintain this class?
>>>>>>> 
>>>>>>> Steven
>>>>>> 
>>>>>> 
>>> 
>>> 
>>

Re: MultiTransactionJobletEventListenerFactory

Posted by Mike Carey <dt...@gmail.com>.

This makes good sense to me!  (But I'm not sufficiently expert on the 
code to know for sure; I just know that danger seems to lurk in deadlock 
land if the detection model doesn't have enough of an understanding of 
who the actors are and what blocking might do.  It may be that our 
transactor notion has this case covered too - but - I'd be a little 
surprised if it does.)


On 11/16/17 9:46 PM, Xikui Wang wrote:
> Yes. That deadlock could happen. Currently, we have one-to-one mappings for
> the jobs and transactions, except for the feeds.
>
> @Abdullah, after some digging into the code, I think probably we can use a
> single transaction id for the job which feeds multiple datasets? See if I
> can convince you. :)
>
> The reason we have multiple transaction ids in feeds is that we compile
> each connection job separately and combine them into a single feed job. A
> new transaction id is created and assigned to each connection job, thus for
> the combined job, we have to handle the different transactions as they
> are embedded in the connection job specifications. But, what if we create a
> single transaction id for the combined job? That transaction id will be
> embedded into each connection so they can write logs freely, but the
> transaction will be started and committed only once as there is only one
> feed job. In this way, we won't need multiTransactionJobletEventListener
> and the transaction id can be removed from the job specification easily as
> well (for Steven's change).
>
> Best,
> Xikui
>
>
> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dt...@gmail.com> wrote:
>
>> I worry about deadlocks.  The waits for graph may not understand that
>> making t1 wait will also make t2 wait since they may share a thread -
>> right?  Or do we have jobs and transactions separately represented there
>> now?
>>
>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <ba...@gmail.com> wrote:
>>
>>> We are using multiple transactions in a single job in case of feed and I
>>> think that this is the correct way.
>>> Having a single job for a feed that feeds into multiple datasets is a
>> good
>>> thing since job resources/feed resources are consolidated.
>>>
>>> Here are some points:
>>> - We can't use the same transaction id to feed multiple datasets. The
>> only
>>> other option is to have multiple jobs each feeding a different dataset.
>>> - Having multiple jobs (in addition to the extra resources used, memory
>>> and CPU) would then forces us to either read data from external sources
>>> multiple times, parse records multiple times, etc
>>>    or having to have a synchronization between the different jobs and the
>>> feed source within asterixdb. IMO, this is far more complicated than
>> having
>>> multiple transactions within a single job and the cost far outweigh the
>>> benefits.
>>>
>>> P.S,
>>> We are also using this for bucket connections in Couchbase Analytics.
>>>
>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org> wrote:
>>>>
>>>> If there are a number of issue with supporting multiple transaction ids
>>>> and no clear benefits/use-cases, I’d vote for simplification :)
>>>> Also, code that’s not being used has a tendency to "rot" and so I think
>>>> that it’s usefulness might be limited by the time we’d find a use for
>>>> this functionality.
>>>>
>>>> My 2c,
>>>> Till
>>>>
>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>>
>>>>> I'm separating the connections into different jobs in some of my
>>>>> experiments... but that was intended to be used for the experimental
>>>>> settings (i.e., not for master now)...
>>>>>
>>>>> I think the interesting question here is whether we want to allow one
>>>>> Hyracks job to carry multiple transactions. I personally think that
>>> should
>>>>> be allowed as the transaction and job are two separate concepts, but I
>>>>> couldn't find such use cases other than the feeds. Does anyone have a
>>> good
>>>>> example on this?
>>>>>
>>>>> Another question is, if we do allow multiple transactions in a single
>>>>> Hyracks job, how do we enable commit runtime to obtain the correct TXN
>>> id
>>>>> without having that embedded as part of the job specification.
>>>>>
>>>>> Best,
>>>>> Xikui
>>>>>
>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>> bamousaa@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I am curious as to how feed will work without this?
>>>>>>
>>>>>> ~Abdullah.
>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu>
>> wrote:
>>>>>>> Hi all,
>>>>>>> We currently have MultiTransactionJobletEventListenerFactory, which
>>>>>> allows
>>>>>>> for one Hyracks job to run multiple Asterix transactions together.
>>>>>>>
>>>>>>> This class is only used by feeds, and feeds are in process of
>>> changing to
>>>>>>> no longer need this feature. As part of the work in pre-deploying
>> job
>>>>>>> specifications to be used by multiple hyracks jobs, I've been
>> working
>>> on
>>>>>>> removing the transaction id from the job specifications, as we use a
>>> new
>>>>>>> transaction for each invocation of a deployed job.
>>>>>>>
>>>>>>> There is currently no clear way to remove the transaction id from
>> the
>>> job
>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
>>> tenerFactory.
>>>>>>> The question for the group is, do we see a need to maintain this
>> class
>>>>>> that
>>>>>>> will no longer be used by any current code? Or, an other words, is
>>> there
>>>>>> a
>>>>>>> strong possibility that in the future we will want multiple
>>> transactions
>>>>>> to
>>>>>>> share a single Hyracks job, meaning that it is worth figuring out
>> how
>>> to
>>>>>>> maintain this class?
>>>>>>>
>>>>>>> Steven
>>>>>>
>>>

Re: MultiTransactionJobletEventListenerFactory

Posted by Xikui Wang <xi...@uci.edu>.

Yes. That deadlock could happen. Currently, we have one-to-one mappings for
the jobs and transactions, except for the feeds.

@Abdullah, after some digging into the code, I think probably we can use a
single transaction id for the job which feeds multiple datasets? See if I
can convince you. :)

The reason we have multiple transaction ids in feeds is that we compile
each connection job separately and combine them into a single feed job. A
new transaction id is created and assigned to each connection job, thus for
the combined job, we have to handle the different transactions as they
are embedded in the connection job specifications. But, what if we create a
single transaction id for the combined job? That transaction id will be
embedded into each connection so they can write logs freely, but the
transaction will be started and committed only once as there is only one
feed job. In this way, we won't need multiTransactionJobletEventListener
and the transaction id can be removed from the job specification easily as
well (for Steven's change).

Best,
Xikui


On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <dt...@gmail.com> wrote:

> I worry about deadlocks.  The waits for graph may not understand that
> making t1 wait will also make t2 wait since they may share a thread -
> right?  Or do we have jobs and transactions separately represented there
> now?
>
> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <ba...@gmail.com> wrote:
>
> > We are using multiple transactions in a single job in case of feed and I
> > think that this is the correct way.
> > Having a single job for a feed that feeds into multiple datasets is a
> good
> > thing since job resources/feed resources are consolidated.
> >
> > Here are some points:
> > - We can't use the same transaction id to feed multiple datasets. The
> only
> > other option is to have multiple jobs each feeding a different dataset.
> > - Having multiple jobs (in addition to the extra resources used, memory
> > and CPU) would then forces us to either read data from external sources
> > multiple times, parse records multiple times, etc
> >   or having to have a synchronization between the different jobs and the
> > feed source within asterixdb. IMO, this is far more complicated than
> having
> > multiple transactions within a single job and the cost far outweigh the
> > benefits.
> >
> > P.S,
> > We are also using this for bucket connections in Couchbase Analytics.
> >
> > > On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org> wrote:
> > >
> > > If there are a number of issue with supporting multiple transaction ids
> > > and no clear benefits/use-cases, I’d vote for simplification :)
> > > Also, code that’s not being used has a tendency to "rot" and so I think
> > > that it’s usefulness might be limited by the time we’d find a use for
> > > this functionality.
> > >
> > > My 2c,
> > > Till
> > >
> > > On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> > >
> > >> I'm separating the connections into different jobs in some of my
> > >> experiments... but that was intended to be used for the experimental
> > >> settings (i.e., not for master now)...
> > >>
> > >> I think the interesting question here is whether we want to allow one
> > >> Hyracks job to carry multiple transactions. I personally think that
> > should
> > >> be allowed as the transaction and job are two separate concepts, but I
> > >> couldn't find such use cases other than the feeds. Does anyone have a
> > good
> > >> example on this?
> > >>
> > >> Another question is, if we do allow multiple transactions in a single
> > >> Hyracks job, how do we enable commit runtime to obtain the correct TXN
> > id
> > >> without having that embedded as part of the job specification.
> > >>
> > >> Best,
> > >> Xikui
> > >>
> > >> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
> bamousaa@gmail.com>
> > >> wrote:
> > >>
> > >>> I am curious as to how feed will work without this?
> > >>>
> > >>> ~Abdullah.
> > >>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu>
> wrote:
> > >>>>
> > >>>> Hi all,
> > >>>> We currently have MultiTransactionJobletEventListenerFactory, which
> > >>> allows
> > >>>> for one Hyracks job to run multiple Asterix transactions together.
> > >>>>
> > >>>> This class is only used by feeds, and feeds are in process of
> > changing to
> > >>>> no longer need this feature. As part of the work in pre-deploying
> job
> > >>>> specifications to be used by multiple hyracks jobs, I've been
> working
> > on
> > >>>> removing the transaction id from the job specifications, as we use a
> > new
> > >>>> transaction for each invocation of a deployed job.
> > >>>>
> > >>>> There is currently no clear way to remove the transaction id from
> the
> > job
> > >>>> spec and keep the option for MultiTransactionJobletEventLis
> > tenerFactory.
> > >>>>
> > >>>> The question for the group is, do we see a need to maintain this
> class
> > >>> that
> > >>>> will no longer be used by any current code? Or, an other words, is
> > there
> > >>> a
> > >>>> strong possibility that in the future we will want multiple
> > transactions
> > >>> to
> > >>>> share a single Hyracks job, meaning that it is worth figuring out
> how
> > to
> > >>>> maintain this class?
> > >>>>
> > >>>> Steven
> > >>>
> > >>>
> >
> >
>

Re: MultiTransactionJobletEventListenerFactory

Posted by Mike Carey <dt...@gmail.com>.

I worry about deadlocks.  The waits for graph may not understand that
making t1 wait will also make t2 wait since they may share a thread -
right?  Or do we have jobs and transactions separately represented there
now?

On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <ba...@gmail.com> wrote:

> We are using multiple transactions in a single job in case of feed and I
> think that this is the correct way.
> Having a single job for a feed that feeds into multiple datasets is a good
> thing since job resources/feed resources are consolidated.
>
> Here are some points:
> - We can't use the same transaction id to feed multiple datasets. The only
> other option is to have multiple jobs each feeding a different dataset.
> - Having multiple jobs (in addition to the extra resources used, memory
> and CPU) would then forces us to either read data from external sources
> multiple times, parse records multiple times, etc
>   or having to have a synchronization between the different jobs and the
> feed source within asterixdb. IMO, this is far more complicated than having
> multiple transactions within a single job and the cost far outweigh the
> benefits.
>
> P.S,
> We are also using this for bucket connections in Couchbase Analytics.
>
> > On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org> wrote:
> >
> > If there are a number of issue with supporting multiple transaction ids
> > and no clear benefits/use-cases, I’d vote for simplification :)
> > Also, code that’s not being used has a tendency to "rot" and so I think
> > that it’s usefulness might be limited by the time we’d find a use for
> > this functionality.
> >
> > My 2c,
> > Till
> >
> > On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> >
> >> I'm separating the connections into different jobs in some of my
> >> experiments... but that was intended to be used for the experimental
> >> settings (i.e., not for master now)...
> >>
> >> I think the interesting question here is whether we want to allow one
> >> Hyracks job to carry multiple transactions. I personally think that
> should
> >> be allowed as the transaction and job are two separate concepts, but I
> >> couldn't find such use cases other than the feeds. Does anyone have a
> good
> >> example on this?
> >>
> >> Another question is, if we do allow multiple transactions in a single
> >> Hyracks job, how do we enable commit runtime to obtain the correct TXN
> id
> >> without having that embedded as part of the job specification.
> >>
> >> Best,
> >> Xikui
> >>
> >> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <ba...@gmail.com>
> >> wrote:
> >>
> >>> I am curious as to how feed will work without this?
> >>>
> >>> ~Abdullah.
> >>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu> wrote:
> >>>>
> >>>> Hi all,
> >>>> We currently have MultiTransactionJobletEventListenerFactory, which
> >>> allows
> >>>> for one Hyracks job to run multiple Asterix transactions together.
> >>>>
> >>>> This class is only used by feeds, and feeds are in process of
> changing to
> >>>> no longer need this feature. As part of the work in pre-deploying job
> >>>> specifications to be used by multiple hyracks jobs, I've been working
> on
> >>>> removing the transaction id from the job specifications, as we use a
> new
> >>>> transaction for each invocation of a deployed job.
> >>>>
> >>>> There is currently no clear way to remove the transaction id from the
> job
> >>>> spec and keep the option for MultiTransactionJobletEventLis
> tenerFactory.
> >>>>
> >>>> The question for the group is, do we see a need to maintain this class
> >>> that
> >>>> will no longer be used by any current code? Or, an other words, is
> there
> >>> a
> >>>> strong possibility that in the future we will want multiple
> transactions
> >>> to
> >>>> share a single Hyracks job, meaning that it is worth figuring out how
> to
> >>>> maintain this class?
> >>>>
> >>>> Steven
> >>>
> >>>
>
>

Re: MultiTransactionJobletEventListenerFactory

Posted by Taewoo Kim <wa...@gmail.com>.

Not sure whether this conversation is related to the concept of
"transactor" in
https://cwiki.apache.org/confluence/display/ASTERIXDB/Deadlock-Free+Locking+Protocol
.

Best,
Taewoo

On Thu, Nov 16, 2017 at 3:41 PM, Xikui Wang <xi...@uci.edu> wrote:

> How about we separate the ingestion part from the rest? We can create Job0
> for the ingestion which takes data from the datasource, and create Job1,
> Job2, ... for the connections to dataset1, dataset2, dataset3
> respectively... We would need to pay the resource overhead still, but the
> synchronization can be avoided. (I'm in the same camp with you, Abdullah. I
> just want to pick up your brain to see how far this idea can go. :) )
>
> If we want to keep multiple transactions in a single job and keep the
> transaction id out of the job specification, we need to let the commit
> runtime get the right transaction id from somewhere... Any good idea on
> this?
>
> Best,
> Xikui
>
> On Thu, Nov 16, 2017 at 3:10 PM, abdullah alamoudi <ba...@gmail.com>
> wrote:
>
> > We are using multiple transactions in a single job in case of feed and I
> > think that this is the correct way.
> > Having a single job for a feed that feeds into multiple datasets is a
> good
> > thing since job resources/feed resources are consolidated.
> >
> > Here are some points:
> > - We can't use the same transaction id to feed multiple datasets. The
> only
> > other option is to have multiple jobs each feeding a different dataset.
> > - Having multiple jobs (in addition to the extra resources used, memory
> > and CPU) would then forces us to either read data from external sources
> > multiple times, parse records multiple times, etc
> >   or having to have a synchronization between the different jobs and the
> > feed source within asterixdb. IMO, this is far more complicated than
> having
> > multiple transactions within a single job and the cost far outweigh the
> > benefits.
> >
> > P.S,
> > We are also using this for bucket connections in Couchbase Analytics.
> >
> > > On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org> wrote:
> > >
> > > If there are a number of issue with supporting multiple transaction ids
> > > and no clear benefits/use-cases, I’d vote for simplification :)
> > > Also, code that’s not being used has a tendency to "rot" and so I think
> > > that it’s usefulness might be limited by the time we’d find a use for
> > > this functionality.
> > >
> > > My 2c,
> > > Till
> > >
> > > On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> > >
> > >> I'm separating the connections into different jobs in some of my
> > >> experiments... but that was intended to be used for the experimental
> > >> settings (i.e., not for master now)...
> > >>
> > >> I think the interesting question here is whether we want to allow one
> > >> Hyracks job to carry multiple transactions. I personally think that
> > should
> > >> be allowed as the transaction and job are two separate concepts, but I
> > >> couldn't find such use cases other than the feeds. Does anyone have a
> > good
> > >> example on this?
> > >>
> > >> Another question is, if we do allow multiple transactions in a single
> > >> Hyracks job, how do we enable commit runtime to obtain the correct TXN
> > id
> > >> without having that embedded as part of the job specification.
> > >>
> > >> Best,
> > >> Xikui
> > >>
> > >> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
> bamousaa@gmail.com>
> > >> wrote:
> > >>
> > >>> I am curious as to how feed will work without this?
> > >>>
> > >>> ~Abdullah.
> > >>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu>
> wrote:
> > >>>>
> > >>>> Hi all,
> > >>>> We currently have MultiTransactionJobletEventListenerFactory, which
> > >>> allows
> > >>>> for one Hyracks job to run multiple Asterix transactions together.
> > >>>>
> > >>>> This class is only used by feeds, and feeds are in process of
> > changing to
> > >>>> no longer need this feature. As part of the work in pre-deploying
> job
> > >>>> specifications to be used by multiple hyracks jobs, I've been
> working
> > on
> > >>>> removing the transaction id from the job specifications, as we use a
> > new
> > >>>> transaction for each invocation of a deployed job.
> > >>>>
> > >>>> There is currently no clear way to remove the transaction id from
> the
> > job
> > >>>> spec and keep the option for MultiTransactionJobletEventLis
> > tenerFactory.
> > >>>>
> > >>>> The question for the group is, do we see a need to maintain this
> class
> > >>> that
> > >>>> will no longer be used by any current code? Or, an other words, is
> > there
> > >>> a
> > >>>> strong possibility that in the future we will want multiple
> > transactions
> > >>> to
> > >>>> share a single Hyracks job, meaning that it is worth figuring out
> how
> > to
> > >>>> maintain this class?
> > >>>>
> > >>>> Steven
> > >>>
> > >>>
> >
> >
>

Re: MultiTransactionJobletEventListenerFactory

Posted by Xikui Wang <xi...@uci.edu>.

How about we separate the ingestion part from the rest? We can create Job0
for the ingestion which takes data from the datasource, and create Job1,
Job2, ... for the connections to dataset1, dataset2, dataset3
respectively... We would need to pay the resource overhead still, but the
synchronization can be avoided. (I'm in the same camp with you, Abdullah. I
just want to pick up your brain to see how far this idea can go. :) )

If we want to keep multiple transactions in a single job and keep the
transaction id out of the job specification, we need to let the commit
runtime get the right transaction id from somewhere... Any good idea on
this?

Best,
Xikui

On Thu, Nov 16, 2017 at 3:10 PM, abdullah alamoudi <ba...@gmail.com>
wrote:

> We are using multiple transactions in a single job in case of feed and I
> think that this is the correct way.
> Having a single job for a feed that feeds into multiple datasets is a good
> thing since job resources/feed resources are consolidated.
>
> Here are some points:
> - We can't use the same transaction id to feed multiple datasets. The only
> other option is to have multiple jobs each feeding a different dataset.
> - Having multiple jobs (in addition to the extra resources used, memory
> and CPU) would then forces us to either read data from external sources
> multiple times, parse records multiple times, etc
>   or having to have a synchronization between the different jobs and the
> feed source within asterixdb. IMO, this is far more complicated than having
> multiple transactions within a single job and the cost far outweigh the
> benefits.
>
> P.S,
> We are also using this for bucket connections in Couchbase Analytics.
>
> > On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org> wrote:
> >
> > If there are a number of issue with supporting multiple transaction ids
> > and no clear benefits/use-cases, I’d vote for simplification :)
> > Also, code that’s not being used has a tendency to "rot" and so I think
> > that it’s usefulness might be limited by the time we’d find a use for
> > this functionality.
> >
> > My 2c,
> > Till
> >
> > On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> >
> >> I'm separating the connections into different jobs in some of my
> >> experiments... but that was intended to be used for the experimental
> >> settings (i.e., not for master now)...
> >>
> >> I think the interesting question here is whether we want to allow one
> >> Hyracks job to carry multiple transactions. I personally think that
> should
> >> be allowed as the transaction and job are two separate concepts, but I
> >> couldn't find such use cases other than the feeds. Does anyone have a
> good
> >> example on this?
> >>
> >> Another question is, if we do allow multiple transactions in a single
> >> Hyracks job, how do we enable commit runtime to obtain the correct TXN
> id
> >> without having that embedded as part of the job specification.
> >>
> >> Best,
> >> Xikui
> >>
> >> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <ba...@gmail.com>
> >> wrote:
> >>
> >>> I am curious as to how feed will work without this?
> >>>
> >>> ~Abdullah.
> >>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu> wrote:
> >>>>
> >>>> Hi all,
> >>>> We currently have MultiTransactionJobletEventListenerFactory, which
> >>> allows
> >>>> for one Hyracks job to run multiple Asterix transactions together.
> >>>>
> >>>> This class is only used by feeds, and feeds are in process of
> changing to
> >>>> no longer need this feature. As part of the work in pre-deploying job
> >>>> specifications to be used by multiple hyracks jobs, I've been working
> on
> >>>> removing the transaction id from the job specifications, as we use a
> new
> >>>> transaction for each invocation of a deployed job.
> >>>>
> >>>> There is currently no clear way to remove the transaction id from the
> job
> >>>> spec and keep the option for MultiTransactionJobletEventLis
> tenerFactory.
> >>>>
> >>>> The question for the group is, do we see a need to maintain this class
> >>> that
> >>>> will no longer be used by any current code? Or, an other words, is
> there
> >>> a
> >>>> strong possibility that in the future we will want multiple
> transactions
> >>> to
> >>>> share a single Hyracks job, meaning that it is worth figuring out how
> to
> >>>> maintain this class?
> >>>>
> >>>> Steven
> >>>
> >>>
>
>

Re: MultiTransactionJobletEventListenerFactory

Posted by Murtadha Hubail <hu...@gmail.com>.

Yes, this can happen in metadata transactions. I understand some of Abdullah’s concerns in reusing
the same transaction id for multiple datasets, at least from our transaction model point of view.
However, I believe the newly added infrastructure to support atomic transactions on multiple datasets
can be tweaked a bit to support this case. But Abdullah might have other concerns that are not related
this.

Cheers,
Murtadha

On 17/11/2017, 11:20 AM, "Mike Carey" <dt...@gmail.com> wrote:

    It's not clear to me why we can't do what's said below, BTW.  
    (@Murtadah, doesn't this sometimes happen in the land of metadata 
    transactions now?)
    
    
    On 11/16/17 3:10 PM, abdullah alamoudi wrote:
    > We can't use the same transaction id to feed multiple datasets.

Re: MultiTransactionJobletEventListenerFactory

Posted by Mike Carey <dt...@gmail.com>.

It's not clear to me why we can't do what's said below, BTW.  
(@Murtadah, doesn't this sometimes happen in the land of metadata 
transactions now?)


On 11/16/17 3:10 PM, abdullah alamoudi wrote:
> We can't use the same transaction id to feed multiple datasets.

Re: MultiTransactionJobletEventListenerFactory

Posted by abdullah alamoudi <ba...@gmail.com>.

We are using multiple transactions in a single job in case of feed and I think that this is the correct way.
Having a single job for a feed that feeds into multiple datasets is a good thing since job resources/feed resources are consolidated.

Here are some points:
- We can't use the same transaction id to feed multiple datasets. The only other option is to have multiple jobs each feeding a different dataset.
- Having multiple jobs (in addition to the extra resources used, memory and CPU) would then forces us to either read data from external sources multiple times, parse records multiple times, etc
  or having to have a synchronization between the different jobs and the feed source within asterixdb. IMO, this is far more complicated than having multiple transactions within a single job and the cost far outweigh the benefits.

P.S,
We are also using this for bucket connections in Couchbase Analytics.

> On Nov 16, 2017, at 2:57 PM, Till Westmann <ti...@apache.org> wrote:
> 
> If there are a number of issue with supporting multiple transaction ids
> and no clear benefits/use-cases, I’d vote for simplification :)
> Also, code that’s not being used has a tendency to "rot" and so I think
> that it’s usefulness might be limited by the time we’d find a use for
> this functionality.
> 
> My 2c,
> Till
> 
> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> 
>> I'm separating the connections into different jobs in some of my
>> experiments... but that was intended to be used for the experimental
>> settings (i.e., not for master now)...
>> 
>> I think the interesting question here is whether we want to allow one
>> Hyracks job to carry multiple transactions. I personally think that should
>> be allowed as the transaction and job are two separate concepts, but I
>> couldn't find such use cases other than the feeds. Does anyone have a good
>> example on this?
>> 
>> Another question is, if we do allow multiple transactions in a single
>> Hyracks job, how do we enable commit runtime to obtain the correct TXN id
>> without having that embedded as part of the job specification.
>> 
>> Best,
>> Xikui
>> 
>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <ba...@gmail.com>
>> wrote:
>> 
>>> I am curious as to how feed will work without this?
>>> 
>>> ~Abdullah.
>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu> wrote:
>>>> 
>>>> Hi all,
>>>> We currently have MultiTransactionJobletEventListenerFactory, which
>>> allows
>>>> for one Hyracks job to run multiple Asterix transactions together.
>>>> 
>>>> This class is only used by feeds, and feeds are in process of changing to
>>>> no longer need this feature. As part of the work in pre-deploying job
>>>> specifications to be used by multiple hyracks jobs, I've been working on
>>>> removing the transaction id from the job specifications, as we use a new
>>>> transaction for each invocation of a deployed job.
>>>> 
>>>> There is currently no clear way to remove the transaction id from the job
>>>> spec and keep the option for MultiTransactionJobletEventListenerFactory.
>>>> 
>>>> The question for the group is, do we see a need to maintain this class
>>> that
>>>> will no longer be used by any current code? Or, an other words, is there
>>> a
>>>> strong possibility that in the future we will want multiple transactions
>>> to
>>>> share a single Hyracks job, meaning that it is worth figuring out how to
>>>> maintain this class?
>>>> 
>>>> Steven
>>> 
>>>

Re: MultiTransactionJobletEventListenerFactory

Posted by Till Westmann <ti...@apache.org>.

If there are a number of issue with supporting multiple transaction ids
and no clear benefits/use-cases, I’d vote for simplification :)
Also, code that’s not being used has a tendency to "rot" and so I think
that it’s usefulness might be limited by the time we’d find a use for
this functionality.

My 2c,
Till

On 16 Nov 2017, at 13:57, Xikui Wang wrote:

> I'm separating the connections into different jobs in some of my
> experiments... but that was intended to be used for the experimental
> settings (i.e., not for master now)...
>
> I think the interesting question here is whether we want to allow one
> Hyracks job to carry multiple transactions. I personally think that should
> be allowed as the transaction and job are two separate concepts, but I
> couldn't find such use cases other than the feeds. Does anyone have a good
> example on this?
>
> Another question is, if we do allow multiple transactions in a single
> Hyracks job, how do we enable commit runtime to obtain the correct TXN id
> without having that embedded as part of the job specification.
>
> Best,
> Xikui
>
> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <ba...@gmail.com>
> wrote:
>
>> I am curious as to how feed will work without this?
>>
>> ~Abdullah.
>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu> wrote:
>>>
>>> Hi all,
>>> We currently have MultiTransactionJobletEventListenerFactory, which
>> allows
>>> for one Hyracks job to run multiple Asterix transactions together.
>>>
>>> This class is only used by feeds, and feeds are in process of changing to
>>> no longer need this feature. As part of the work in pre-deploying job
>>> specifications to be used by multiple hyracks jobs, I've been working on
>>> removing the transaction id from the job specifications, as we use a new
>>> transaction for each invocation of a deployed job.
>>>
>>> There is currently no clear way to remove the transaction id from the job
>>> spec and keep the option for MultiTransactionJobletEventListenerFactory.
>>>
>>> The question for the group is, do we see a need to maintain this class
>> that
>>> will no longer be used by any current code? Or, an other words, is there
>> a
>>> strong possibility that in the future we will want multiple transactions
>> to
>>> share a single Hyracks job, meaning that it is worth figuring out how to
>>> maintain this class?
>>>
>>> Steven
>>
>>

Re: MultiTransactionJobletEventListenerFactory

Posted by Xikui Wang <xi...@uci.edu>.

I'm separating the connections into different jobs in some of my
experiments... but that was intended to be used for the experimental
settings (i.e., not for master now)...

I think the interesting question here is whether we want to allow one
Hyracks job to carry multiple transactions. I personally think that should
be allowed as the transaction and job are two separate concepts, but I
couldn't find such use cases other than the feeds. Does anyone have a good
example on this?

Another question is, if we do allow multiple transactions in a single
Hyracks job, how do we enable commit runtime to obtain the correct TXN id
without having that embedded as part of the job specification.

Best,
Xikui

On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <ba...@gmail.com>
wrote:

> I am curious as to how feed will work without this?
>
> ~Abdullah.
> > On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu> wrote:
> >
> > Hi all,
> > We currently have MultiTransactionJobletEventListenerFactory, which
> allows
> > for one Hyracks job to run multiple Asterix transactions together.
> >
> > This class is only used by feeds, and feeds are in process of changing to
> > no longer need this feature. As part of the work in pre-deploying job
> > specifications to be used by multiple hyracks jobs, I've been working on
> > removing the transaction id from the job specifications, as we use a new
> > transaction for each invocation of a deployed job.
> >
> > There is currently no clear way to remove the transaction id from the job
> > spec and keep the option for MultiTransactionJobletEventListenerFactory.
> >
> > The question for the group is, do we see a need to maintain this class
> that
> > will no longer be used by any current code? Or, an other words, is there
> a
> > strong possibility that in the future we will want multiple transactions
> to
> > share a single Hyracks job, meaning that it is worth figuring out how to
> > maintain this class?
> >
> > Steven
>
>

Re: MultiTransactionJobletEventListenerFactory

Posted by abdullah alamoudi <ba...@gmail.com>.

I am curious as to how feed will work without this?

~Abdullah.
> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <sj...@ucr.edu> wrote:
> 
> Hi all,
> We currently have MultiTransactionJobletEventListenerFactory, which allows
> for one Hyracks job to run multiple Asterix transactions together.
> 
> This class is only used by feeds, and feeds are in process of changing to
> no longer need this feature. As part of the work in pre-deploying job
> specifications to be used by multiple hyracks jobs, I've been working on
> removing the transaction id from the job specifications, as we use a new
> transaction for each invocation of a deployed job.
> 
> There is currently no clear way to remove the transaction id from the job
> spec and keep the option for MultiTransactionJobletEventListenerFactory.
> 
> The question for the group is, do we see a need to maintain this class that
> will no longer be used by any current code? Or, an other words, is there a
> strong possibility that in the future we will want multiple transactions to
> share a single Hyracks job, meaning that it is worth figuring out how to
> maintain this class?
> 
> Steven