You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@commons.apache.org by Bear Giles <bg...@coyotesong.com> on 2009/04/19 15:53:43 UTC

proposal: numerical models [physics?]

Hi, I was wondering if there would be interest in numerical models of 
physical constants.  For instance, saturation pressure of water vapor in 
air at a particular temperature.  It would also be appropriate to 
provide a method to get relative humidity from wet and dry bulb 
temperatures since it directly relates to this saturation pressure.  All 
of the models should be time-invariant, e.g., no historical weather 
observations.

A simple unexplained model isn't very useful so there would be multiple 
elements:

1) a significant and attributed dataset.  You don't want three numbers 
somebody found in the back of some book, you want something like 1% of 
the values in the 74th edition of the CRC handbook of bobcats and 
weasels.  This would be provided as a XML document, and there would be a 
mechanism to support both multivariant data and sets of related values.

I don't know how copyright plays into this since it's redistributing 
data.  I know that, in the past, it would have been okay in the US at 
least.  There are famous cases involving phone books, 
compilers/assemblers, etc., that established that you can copyright the 
presentation but not facts.  But I know publishers were trying to change 
that, and I'm not familiar with copyright law in other countries.

(It would also be nice to have tools to help people see if they screwed 
up the data when entering it.)

2) one or more numeric models, plus methods to calculate the appropriate 
coefficients from the data. You could have multiple models because of 
different needs.  E.g., one person requires a highly accurate model, but 
for somebody else the best fit could be something 'quick and dirty' 
since they're computing millions of values but don't need high accuracy.

3) analysis tools, to determine how accurate the model is.

4) (maybe) tools to create standard charts and graphs.  E.g., in 
meteorology there is a standard chart used with weather balloons because 
it makes it easy to determine if the atmosphere is unstable.  Having the 
ability to produce this chart + overlaid data would be very useful, but 
what format?  With what tools?  E.g., do you produce embedded postscript 
(for print media)?  An image?  A SVG?

The second and third items could probably pull a lot from [math], or 
even reside in that project, with just the actual models and the 
underlying data in this project.  Obviously people should be able to 
download just the model.  On the other hand some people might want to 
write their own models and having the tools and data in place would be a 
godsend.

About myself: I have undergraduate degrees in both math and physics, 
most of an advanced degree in computer science, and have been working as 
a professional software developer for 25 years.  The motivation for this 
proposal is working at NOAA a decade ago - I worked with scientists who 
knew the science but didn't know they didn't know enough to write 
well-engineered numerical models.  The 'three unattributed data points' 
isn't a joke.  I've been planning to make this proposal for years, I've 
just never gotten around to it.

Bear

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: proposal: numerical models [physics?]

Posted by Ted Dunning <te...@gmail.com>.

I think that this project is at once too limited and too grandiose.

Too limited because of the limitation on distributing model building tool +
data rather than allowing some models to be included without the underlying
data (Farenheit to celsius or MKS to CGS, for instance) or documentation
describing methods of derivation and limitations of range and applicability
(i.e. the academic paper where the data was analyzed).

Too grandiose because you could just as well break it down into four (or
more) big projects.  If you do break this down, then I think you will find
that many parts of what you want to address are already done.

So, for instance, there is:

a) data repository a la wiki.  This is being done and done very well things
like the manyeye <http://manyeyes.alphaworks.ibm.com/manyeyes/>s and
verifiable <http://verifiable.com/> and swivel <http://www.swivel.com/>.
Combine that with the dictates of modern academic publishing to publish
on-line and to provide the raw data in addition to the primary article and
it is clear that one of these efforts is going to succeed.

b) data visualization for these public data sets.  Manyeyes and verifiable
both provide this and very nicely indeed.

c) data mining / curve fitting.  Commons math is providing these tools at
one end, apache mahout is providing differently focused versions of this
same sort of thing.

d) a compendium of well founded physical correlations.  This is the only
part of what you are talking about that isn't already readily available and
this might be of some interest, but I don't understand how to attack.  My
own tendency would be to say that the derivation of the law should not be up
to the reader since that so often leads to serious error.  Instead, there
should be some simple way to export an executable form of a correlation from
more or less academically rigorous publications.  This is because it is
otherwise hard to state the limitations and requirements of these
relationships.

I also worry that you will have a serious long-tail problem very, very
quickly where the interest base of each rule is too small for there to be a
significant audience.  Compounded with inevitable difficulties in finding
the correlation rule you need, the effective social benefit could easily be
essentially nil.  In order to address the long-tail problem and this search
problem, it might be best if there were some way to effectively link from
the original academic publication to an executable form of the conclusions
of the paper.  Then, if you could strike a deal with arxiv or jmlr or ploss,
you might have something useful.  Approaching manyeyes or verifiable might
also be useful.

Good luck with your idea!

On Sun, Apr 19, 2009 at 8:55 AM, Bear Giles <bg...@coyotesong.com> wrote:

> Luc Maisonobe wrote:
>
>> Bear Giles a écrit :
>>
>>
>>> Hi, I was wondering if there would be interest in numerical models of
>>> physical constants.  For instance, saturation pressure of water vapor in
>>> air at a particular temperature.  It would also be appropriate to
>>> provide a method to get relative humidity from wet and dry bulb
>>> temperatures since it directly relates to this saturation pressure.  All
>>> of the models should be time-invariant, e.g., no historical weather
>>> observations.
>>>
>>>
>>
>> ...
>>
> It's not an all-or-nothing situation where the project is only usable after
> thousands of models exist.  Instead I would see it slowly adding models as
> people 1) discover the tools and 2) scratch their own itch.  There may only
> be 10-20 methods added at a time, but that could be enough to significantly
> enhance the project.  I've worked with meteorological models in the past so
> it's a natural place for me to use as a seed.
>
> Also, I might not have been clear earlier that I'm thinking -solely- of
> curve-fitting observational data, either directly or via simple calculations
> of the same.  As a model, we might have observational data like:
> ...

-- 
Ted Dunning, CTO
DeepDyve

Re: proposal: numerical models [physics?]

Posted by Ted Dunning <te...@gmail.com>.

On Sun, Apr 19, 2009 at 8:55 AM, Bear Giles <bg...@coyotesong.com> wrote:

> To be honest I'm not 100% certain how useful the underlying data would be,
> but I keep coming back to the academic question of "how do you know this?"
> on the models.  Most people would be happy to just have a small java library
> that lets them avoid entering data by hand, but researchers would
> legitimately need to know the source of the model.  If we say, by fiat, that
> these concerns will not be addressed then we don't need to worry about
> providing the underlying material beyond a simple reference.
>

The question of "how do you know" is only partly the original data.  It also
requires a careful statement of how the data are analyzed.

 [math] should remain as independent to application as possible.
>
It would definitely be a one-way street.  But there's no point in writing a
> method for, e.g., Hermite polynomials if it already exists in [math].
>  Obviously something general like that would be offered to [math], but there
> wouldn't be any sort of assumption that it would be accepted.
>

A generally useful and new mathematical method that is well written is very
unlikely to be turned down.  Luc and the community are very good about
helping people start with rough submissions and get them polished enough to
fit in well.

-- 
Ted Dunning, CTO
DeepDyve

Re: proposal: numerical models [physics?]

Posted by Bear Giles <bg...@coyotesong.com>.

Luc Maisonobe wrote:
> Bear Giles a écrit :
>   
>> Hi, I was wondering if there would be interest in numerical models of
>> physical constants.  For instance, saturation pressure of water vapor in
>> air at a particular temperature.  It would also be appropriate to
>> provide a method to get relative humidity from wet and dry bulb
>> temperatures since it directly relates to this saturation pressure.  All
>> of the models should be time-invariant, e.g., no historical weather
>> observations.
>>     
>
> I'm puzzled about this proposal. The scope seems completely unbounded
> and will get out of hand quickly.
> I would better see such a project under the wing of wikipedia or some
> foundation like that.
>   
It's not an all-or-nothing situation where the project is only usable 
after thousands of models exist.  Instead I would see it slowly adding 
models as people 1) discover the tools and 2) scratch their own itch.  
There may only be 10-20 methods added at a time, but that could be 
enough to significantly enhance the project.  I've worked with 
meteorological models in the past so it's a natural place for me to use 
as a seed.

Also, I might not have been clear earlier that I'm thinking -solely- of 
curve-fitting observational data, either directly or via simple 
calculations of the same.  As a model, we might have observational data 
like:

x,y = (0,0.9), (1,2), (2,5.1), (3,10.3)

and the resulting model is y(x) = 1 + x*x.  Not quite 100%, but it's 
curve-fitting instead of calculations from first principals.  It would 
just be policy that only the basics of general interest would be modeled.

I know [physics] is too broad, that it should be [physics-?] with ? to 
be determined.

>> A simple unexplained model isn't very useful so there would be multiple
>> elements:
>>
>> 1) a significant and attributed dataset.
> At least in France there is a real problem with data collections like
> that. The law that governs intellectual property (the « code de la
> propriété intellectuelle ») does have specific requirements about
> database. Roughly, if someone has already established a database
> containing anything, you cannot build a similar database containing only
> the same data without infringing its rights.
So there's definitely a need to separately package the models and any 
supporting information.

BTW I'm not sure if this is legally a database since we're talking about 
collections of data that anyone could get in the lab.  There's nothing 
proprietary in the sense of a database containing Amazon orders for the 
population of a city.  It's far closer to the US Intel case where the 
company successfully stopped a competitor from using "move" in their 
assembler (the other company had to use something different like "mov"), 
but could not stop them from having their assembler emit the proper 
opcode for the instruction.  The ruling was based on the idea that the 
choice of the mnemonic was up to human discretion, but the opcode was 
fixed and unique even if it was arbitrarily assigned by Intel during the 
design phase.  The competitor could not emit a different opcode and 
expect the correct behavior.

To be honest I'm not 100% certain how useful the underlying data would 
be, but I keep coming back to the academic question of "how do you know 
this?" on the models.  Most people would be happy to just have a small 
java library that lets them avoid entering data by hand, but researchers 
would legitimately need to know the source of the model.  If we say, by 
fiat, that these concerns will not be addressed then we don't need to 
worry about providing the underlying material beyond a simple reference.

> [math] should remain as independent to application as possible.
It would definitely be a one-way street.  But there's no point in 
writing a method for, e.g., Hermite polynomials if it already exists in 
[math].  Obviously something general like that would be offered to 
[math], but there wouldn't be any sort of assumption that it would be 
accepted.

Bear

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: proposal: numerical models [physics?]

Posted by Luc Maisonobe <Lu...@free.fr>.

Bear Giles a écrit :
> Hi, I was wondering if there would be interest in numerical models of
> physical constants.  For instance, saturation pressure of water vapor in
> air at a particular temperature.  It would also be appropriate to
> provide a method to get relative humidity from wet and dry bulb
> temperatures since it directly relates to this saturation pressure.  All
> of the models should be time-invariant, e.g., no historical weather
> observations.

I'm puzzled about this proposal. The scope seems completely unbounded
and will get out of hand quickly.
I would better see such a project under the wing of wikipedia or some
foundation like that.

> 
> A simple unexplained model isn't very useful so there would be multiple
> elements:
> 
> 1) a significant and attributed dataset.  You don't want three numbers
> somebody found in the back of some book, you want something like 1% of
> the values in the 74th edition of the CRC handbook of bobcats and
> weasels.  This would be provided as a XML document, and there would be a
> mechanism to support both multivariant data and sets of related values.
> 
> I don't know how copyright plays into this since it's redistributing
> data.  I know that, in the past, it would have been okay in the US at
> least.  There are famous cases involving phone books,
> compilers/assemblers, etc., that established that you can copyright the
> presentation but not facts.  But I know publishers were trying to change
> that, and I'm not familiar with copyright law in other countries.

At least in France there is a real problem with data collections like
that. The law that governs intellectual property (the « code de la
propriété intellectuelle ») does have specific requirements about
database. Roughly, if someone has already established a database
containing anything, you cannot build a similar database containing only
the same data without infringing its rights.

> 
> (It would also be nice to have tools to help people see if they screwed
> up the data when entering it.)
> 
> 2) one or more numeric models, plus methods to calculate the appropriate
> coefficients from the data. You could have multiple models because of
> different needs.  E.g., one person requires a highly accurate model, but
> for somebody else the best fit could be something 'quick and dirty'
> since they're computing millions of values but don't need high accuracy.
> 
> 3) analysis tools, to determine how accurate the model is.

I'm not sure the tools and the data should be wrapped together. They
simply don't have the same life cycle and are often not done by the same
teams.

> 
> 4) (maybe) tools to create standard charts and graphs.  E.g., in
> meteorology there is a standard chart used with weather balloons because
> it makes it easy to determine if the atmosphere is unstable.  Having the
> ability to produce this chart + overlaid data would be very useful, but
> what format?  With what tools?  E.g., do you produce embedded postscript
> (for print media)?  An image?  A SVG?

This may be a project on its own.

> 
> The second and third items could probably pull a lot from [math], or
> even reside in that project, with just the actual models and the
> underlying data in this project.  Obviously people should be able to
> download just the model.  On the other hand some people might want to
> write their own models and having the tools and data in place would be a
> godsend.

[math] should remain as independent to application as possible. Its goal
is to be a low level very reusable component. More specific applications
or other libraries are built on top of it in layered architectures.

> 
> About myself: I have undergraduate degrees in both math and physics,
> most of an advanced degree in computer science, and have been working as
> a professional software developer for 25 years.  The motivation for this
> proposal is working at NOAA a decade ago - I worked with scientists who
> knew the science but didn't know they didn't know enough to write
> well-engineered numerical models.  The 'three unattributed data points'
> isn't a joke.  I've been planning to make this proposal for years, I've
> just never gotten around to it.

This is a real big project.

Luc

> 
> Bear
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org