You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Leszek Gawron <ou...@wlkp.org> on 2003/07/29 23:11:36 UTC

What to do if the functionality cannot be implemented with OR mapping

This would be quite long, sometimes OT but I think it will finally get to my
point which is: providing persistence in flow by basing only on O/R tool is not
enough.

OR mapping tool is OK if you have got a _static_ database schema but imagine
you have to write an application that collects questionnaires. 
Some assumptions:
  * The amout of questionnaires collected is 500'000 per year,
  * Every questionnaire contains at least 100 entries, 
  * Each entry contains 5 the same questions (suppose you are
    generating database driver questionnaires about some products). 
  * There are about 100 questionnaire types (different questions) - 5'000 per
    type.

* database schema solution 1: Every answer is a row in a answer table.
It looks something like:

question (
 id int,
 description varchar,
 question_type int
)

questionnaire (
 id int, 
 q_date date,
 description varchar
)

product (
 id int,
 name varchar
 description varchar
)

questionnaire_answer (
 questionnaire_id int,
 product_id int,
 question_id int,
 response varchar
) 

Doing some math: 5 rows per product, 500 rows per fulfilled questionnaire,
250'000'000 (!) rows in one table per year. That is far too much.

* database schema solution 2 
Your application goal is to generate some statistics. But as far as cross
questionnaire statistics do not concert you there is a possibility to break
down the questionnaire_answer table to a bunch of tables (one table per
questionnaire type)

so you introduce:

questionnaire_type (
 id int, 
 name varchar,
 table_name varchar
)

modify questionnaire : 

questionnaire (
 id int,
 q_type int,
 q_date date,
 description varchar
)

you loose your questionnaire_answer table, instead you generate
questionnaire_XXX of the same schema every time new questionnaire type is
introduced.

Basing on assumption you have 100 types right now this limits the number of
rows in each table to 2'500'000. Better, but still far from good.

* database schema solution 3
Assuming the questionnaire schema never changes you can provide a user with a
interface to build a questionnaire schema model and then generate a static
representation in which every product in questionnaire gets not 5 rows but 
just one:

questionnaire_XXX (
 questionnaire_id int,
 product_id int,
 question_1_response varchar,
 question_2_response varchar,
 question_3_response varchar,
 question_4_response varchar,
 question_5_response varchar
)

This way you were able to limit your rows number to 500'000/table/year which
is GOOD!.

You want to write an application using flow and other cocoon possibilities
because:
  a) you will have to provide a user with interface for all of this - flow is
     the sore for your eyes
  b) you will have to generate some reports/statistics and so on - pipeline
     processing would make it real fun

BUT:

1. I CAN BET MY ASS THIS IS IMPOSSIBLE TO MODEL WITH ANY O/R MAPPING TOOL.

2. Even if there is (I'm gonna have to live without my ass) I do not really
   think it is going to be any useful (and/or efficient) for advanced reports 
   (and there's a loooot of statistics to make if you just add some more
   parameters to all this).

3. the only thing cocooners say till now is: use OR mapping tool with flow, otherwise
   you break SoC, you will burn in hell! :)

You say: that is not something that should be built on cocoon. 
I say: cannot be more wrong - suppose the main purpose for this all are web reports
and statistics?

So the real question goes here:
Is there a clean way (in terms of SoC and all that stuff)?

and the conclusion goes here:
Not everything can be implemented with OR tool. Even if it breaks SoC maybe it
would be worth to provide developers with some not-so-clean-and-elegant
replacement but working instead.

You can always do it the hard way (prepare generators and transformers talking
to database directly through JDBC), but maybe there is a cleaner way i do not
see.

What do you think?

Hope it's not just wasting your time.

my rergards
	Leszek Gawron