You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@wicket.apache.org by Tauren Mills <ta...@tauren.com> on 2007/10/29 09:58:49 UTC

How to manage multiple versions of data?

I have a wicket/hibernate/spring project that manages a set of live
data.  Users of the system view the live version of the data.
Currently, administrative CRUD alters the live data as well.  Changes
by an admin are immediately reflected on the site to users.

But a new set of features is going to require that administrators
should be able to make updates and changes without affecting the live
data.  Once they are satisfied with the changes they have made, they
can save/publish the "working data" to be "live data" on the site.

The modification/editing of this data can take a fair amount of time,
perhaps several days.  Thus, I can't just put the working data into
the session and then save it to the DB a few minutes later when the
editing is done.  The working data needs to be able to persist as well
as the live data.

Also, I'm reluctant to just make copies of all the data from the live
version to the working version, because there are certain aspects of
the live data that can be altered by the users.  The users don't
actually edit the basic data, but additional information is attached
to the data as the user uses it.  If the admin is editing data and it
takes a few days, the copy of the data could get stale as users use
it.  Then I'd have data synchronization issues to deal with.

I would google for patterns or techniques to deal with this type of
situation, but I'm not sure what to even search for.  Has anyone had
to deal with anything similar?  Any advice or suggestions?  Or even
keywords to search for?

Thanks!
Tauren

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org

Re: How to manage multiple versions of data?

Posted by Scott Swank <sc...@gmail.com>.

If you're going to have substantial amounts of data then you probably
want to have a database that supports partioning, and then partition
the data by the "live" column.

- Scott


On 10/29/07, Igor Vaynberg <ig...@gmail.com> wrote:
> usually this is done by supporting multiple rows with version and live
> flags. the system always try to query objects that are marked live, so
> instead of saying something like
>
> FROM User u WHERE u.name=? you would do FROM User u WHERE u.name=? AND
> u.live=true
>
> this saves you from having to support what essentially are two sets of
> the same tables in the same schema.
>
> this is also how soft deletes work, which is something you will
> probably have to support if you want to support versioning properly...
>
> just some high level thoughts.
>
> -igor
>
> On 10/29/07, Tauren Mills <ta...@tauren.com> wrote:
> > I have a wicket/hibernate/spring project that manages a set of live
> > data.  Users of the system view the live version of the data.
> > Currently, administrative CRUD alters the live data as well.  Changes
> > by an admin are immediately reflected on the site to users.
> >
> > But a new set of features is going to require that administrators
> > should be able to make updates and changes without affecting the live
> > data.  Once they are satisfied with the changes they have made, they
> > can save/publish the "working data" to be "live data" on the site.
> >
> > The modification/editing of this data can take a fair amount of time,
> > perhaps several days.  Thus, I can't just put the working data into
> > the session and then save it to the DB a few minutes later when the
> > editing is done.  The working data needs to be able to persist as well
> > as the live data.
> >
> > Also, I'm reluctant to just make copies of all the data from the live
> > version to the working version, because there are certain aspects of
> > the live data that can be altered by the users.  The users don't
> > actually edit the basic data, but additional information is attached
> > to the data as the user uses it.  If the admin is editing data and it
> > takes a few days, the copy of the data could get stale as users use
> > it.  Then I'd have data synchronization issues to deal with.
> >
> > I would google for patterns or techniques to deal with this type of
> > situation, but I'm not sure what to even search for.  Has anyone had
> > to deal with anything similar?  Any advice or suggestions?  Or even
> > keywords to search for?
> >
> > Thanks!
> > Tauren
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > For additional commands, e-mail: users-help@wicket.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>


-- 
Scott Swank
reformed mathematician

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org

Re: How to manage multiple versions of data?

Posted by Igor Vaynberg <ig...@gmail.com>.

usually this is done by supporting multiple rows with version and live
flags. the system always try to query objects that are marked live, so
instead of saying something like

FROM User u WHERE u.name=? you would do FROM User u WHERE u.name=? AND
u.live=true

this saves you from having to support what essentially are two sets of
the same tables in the same schema.

this is also how soft deletes work, which is something you will
probably have to support if you want to support versioning properly...

just some high level thoughts.

-igor

On 10/29/07, Tauren Mills <ta...@tauren.com> wrote:
> I have a wicket/hibernate/spring project that manages a set of live
> data.  Users of the system view the live version of the data.
> Currently, administrative CRUD alters the live data as well.  Changes
> by an admin are immediately reflected on the site to users.
>
> But a new set of features is going to require that administrators
> should be able to make updates and changes without affecting the live
> data.  Once they are satisfied with the changes they have made, they
> can save/publish the "working data" to be "live data" on the site.
>
> The modification/editing of this data can take a fair amount of time,
> perhaps several days.  Thus, I can't just put the working data into
> the session and then save it to the DB a few minutes later when the
> editing is done.  The working data needs to be able to persist as well
> as the live data.
>
> Also, I'm reluctant to just make copies of all the data from the live
> version to the working version, because there are certain aspects of
> the live data that can be altered by the users.  The users don't
> actually edit the basic data, but additional information is attached
> to the data as the user uses it.  If the admin is editing data and it
> takes a few days, the copy of the data could get stale as users use
> it.  Then I'd have data synchronization issues to deal with.
>
> I would google for patterns or techniques to deal with this type of
> situation, but I'm not sure what to even search for.  Has anyone had
> to deal with anything similar?  Any advice or suggestions?  Or even
> keywords to search for?
>
> Thanks!
> Tauren
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org

Re: How to manage multiple versions of data?

Posted by Tauren Mills <ta...@tauren.com>.

Thanks for the ideas everyone.  I do like Igor's suggestion of reusing
the same tables and having a "live" flag, as I'd hate to have an
entire extra set of duplicate tables.  However, since this approach
duplicates data, there are still some issues that I'm not sure how to
best deal with.

Let me explain the use case in more detail.  The users of the system
are referees of soccer games.  They need to be able to view their
schedule of matches.  A match includes a game number, date, time, home
team, away team, location, a center referee, and two linesmen.  The
users do not modify any of the main data elements associated with a
match.  But they can accept or reject matches, and that acceptance
information is associated with a match and the referee.

Administrators will be scheduling all of the matches, assigning
referees to matches, etc.  There could be several administrators.
There are many changes and updates that happen on a regular basis to
already published schedules.  Sometimes entire fields are closed due
to bad conditions, and all of the matches have to be moved elsewhere.
 The updating of data can take hours or days depending upon if the
admin is waiting on information.

Updated data should not be released one match at a time, but all
changes should go live at once.  There are many business reasons for
this that and I don't want to try to explain them all.  They mainly
involve publishing incomplete data, schedule reviews, and approvals
prior to publishing.  Creating and modifying these schedules can take
a lot of trial and error, moving matches around and trying to get
everything to work well for all parties involved.  It wouldn't be good
if referees looked at their schedules when a match was in a state of
flux and then didn't check the schedule again. Lastly, when a batch of
changes is completed, referees could be notified of the changes all at
once, not one match change at a time.

So, given this scenario, a live record Match #100 could have
CenterReferee accept the game, Linesman1 reject it, and Linesman2 not
yet respond.  Meanwhile, if there is a non-live duplicate of Match
#100, then that acceptance information would need to be associated
with it as well.  But the acceptance information could change after
the non-live record is created.

I suppose when a non-live match is saved that already has a live
version (thus making it the new live record), the attached information
to the existing live record could be copied to the non-live record,
then deleted, and the non-live record becomes live.  But to me, this
just seems hacky and prone to fail.  Kind of like data synchronization
systems always seem to get screwed up at times.  Maybe it is the
only/best/simplest solution though. I also need to think about how to
purge old edit data that isn't saved.

I think that I could make due with only a live and edit versions of
the data (just 2 rows of the same data at a time).  I don't think I
need to keep track of multiple versions.  It would be nice to be able
to support multiple "batch edit sessions" that are each distinct from
the others.  But that would get *really* complex...

Although, it would be nice if the admin could save just a certain set
of matches (say all 1st Division games, but leave all 2nd Division
games in edit mode).  That way, the admin could work on everything,
and publish sets of changes as he completes them instead of waiting
until all changes are done.  This should still be doable with just a
live and edit version of the data records.

After writing this, I have a much better sense of how to do it.  But
I'd still like to hear any other ideas.  Are there some patterns or
solutions that I should look into that could help with this?  I don't
think long transactions are the way to go for this.  Database
partitioning (which doesn't really solve this problem but is a great
idea for large datasets) isn't needed yet, as the amount of data will
be manageable.

Thanks again!
Tauren

On 10/30/07, Eelco Hillenius <ee...@gmail.com> wrote:
> > using long transactions would be pretty horrible :(
> > long transactions should be avoided like the plague at all cost.
> > Especially with certain databases (like Microsofts or Sybase variants)
>
> I never really used them myself, so that's entirely possible :-)
>
> > i would do it how igors proposes. Have multiply rows for the same data
> > with 1 row that is live and another row (with maybe the editting userid as
> > data so that you know what to get the next time)
>
> Yeah, that's a workable idea. The problem then is cleaning up the
> entries that never were finished, and just the fact that you're
> polluting your data model with administrative data. But it might be
> the best solution.
>
> Eelco
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org

Re: How to manage multiple versions of data?

Posted by Eelco Hillenius <ee...@gmail.com>.

> using long transactions would be pretty horrible :(
> long transactions should be avoided like the plague at all cost.
> Especially with certain databases (like Microsofts or Sybase variants)

I never really used them myself, so that's entirely possible :-)

> i would do it how igors proposes. Have multiply rows for the same data
> with 1 row that is live and another row (with maybe the editting userid as
> data so that you know what to get the next time)

Yeah, that's a workable idea. The problem then is cleaning up the
entries that never were finished, and just the fact that you're
polluting your data model with administrative data. But it might be
the best solution.

Eelco

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org

Re: How to manage multiple versions of data?

Posted by Johan Compagner <jc...@gmail.com>.

using long transactions would be pretty horrible :(
long transactions should be avoided like the plague at all cost.
Especially with certain databases (like Microsofts or Sybase variants)

i would do it how igors proposes. Have multiply rows for the same data
with 1 row that is live and another row (with maybe the editting userid as
data so that you know what to get the next time)

The problem is can there be more then 1 person editting data? Are they
editting there own set? or is there only 2 rows
of the same data at 1 time?  (live and edit)

johan



On 10/29/07, Eelco Hillenius <ee...@gmail.com> wrote:
>
> On 10/29/07, Tauren Mills <ta...@tauren.com> wrote:
> > I have a wicket/hibernate/spring project that manages a set of live
> > data.  Users of the system view the live version of the data.
> > Currently, administrative CRUD alters the live data as well.  Changes
> > by an admin are immediately reflected on the site to users.
> >
> > But a new set of features is going to require that administrators
> > should be able to make updates and changes without affecting the live
> > data.  Once they are satisfied with the changes they have made, they
> > can save/publish the "working data" to be "live data" on the site.
> >
> > The modification/editing of this data can take a fair amount of time,
> > perhaps several days.  Thus, I can't just put the working data into
> > the session and then save it to the DB a few minutes later when the
> > editing is done.  The working data needs to be able to persist as well
> > as the live data.
> >
> > Also, I'm reluctant to just make copies of all the data from the live
> > version to the working version, because there are certain aspects of
> > the live data that can be altered by the users.  The users don't
> > actually edit the basic data, but additional information is attached
> > to the data as the user uses it.  If the admin is editing data and it
> > takes a few days, the copy of the data could get stale as users use
> > it.  Then I'd have data synchronization issues to deal with.
> >
> > I would google for patterns or techniques to deal with this type of
> > situation, but I'm not sure what to even search for.  Has anyone had
> > to deal with anything similar?  Any advice or suggestions?  Or even
> > keywords to search for?
>
> No real experience with this myself, but start with 'long transactions'?
>
> Eelco
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

Re: How to manage multiple versions of data?

Posted by Nick Heudecker <nh...@gmail.com>.

I would also try searching for auditing persistent objects.  There should be
a few pages on the Hibernate wiki about it.

-- 
Nick Heudecker
Professional Wicket Training & Consulting
http://www.systemmobile.com

Eventful - Intelligent Event Management
http://www.eventfulhq.com

Re: How to manage multiple versions of data?

Posted by Eelco Hillenius <ee...@gmail.com>.

On 10/29/07, Tauren Mills <ta...@tauren.com> wrote:
> I have a wicket/hibernate/spring project that manages a set of live
> data.  Users of the system view the live version of the data.
> Currently, administrative CRUD alters the live data as well.  Changes
> by an admin are immediately reflected on the site to users.
>
> But a new set of features is going to require that administrators
> should be able to make updates and changes without affecting the live
> data.  Once they are satisfied with the changes they have made, they
> can save/publish the "working data" to be "live data" on the site.
>
> The modification/editing of this data can take a fair amount of time,
> perhaps several days.  Thus, I can't just put the working data into
> the session and then save it to the DB a few minutes later when the
> editing is done.  The working data needs to be able to persist as well
> as the live data.
>
> Also, I'm reluctant to just make copies of all the data from the live
> version to the working version, because there are certain aspects of
> the live data that can be altered by the users.  The users don't
> actually edit the basic data, but additional information is attached
> to the data as the user uses it.  If the admin is editing data and it
> takes a few days, the copy of the data could get stale as users use
> it.  Then I'd have data synchronization issues to deal with.
>
> I would google for patterns or techniques to deal with this type of
> situation, but I'm not sure what to even search for.  Has anyone had
> to deal with anything similar?  Any advice or suggestions?  Or even
> keywords to search for?

No real experience with this myself, but start with 'long transactions'?

Eelco

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org