You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Gilles <gi...@harfang.homelinux.org> on 2016/01/01 00:42:29 UTC

Re: [math] RealMatrixFormat.parse()

On Thu, 31 Dec 2015 12:54:00 -0600, Ole Ersoy wrote:
> On 12/31/2015 11:10 AM, Gilles wrote:
>> On Wed, 30 Dec 2015 21:33:56 -0600, Ole Ersoy wrote:
>>> Hi,
>>>
>>> In RealMatrixFormat.parse() MatrixUtils makes the decision on what
>>> type of RealMatrix instance to return.
>>
>> Ideally, this is correct as the actual type is an "implementation 
>> detail".
>>> Flexibility is gained if it
>>> just returns double[][] letting the caller decide what type of
>>> RealMatrix instance to create.
>>
>> That could become a problem e.g. for sparse matrices where the 
>> persistent
>> format and the instance type could be optimized for space, but a 
>> "double[][]"
>> cannot be.
> RealMatrixFormat.parse() first creates a double[][] and then it drops
> it into the Matrix wrapper it thinks is best, per MatrixUtils.  By
> leaving out the last step the caller can either use MatrixUtils (Or
> hopefully MatrixFactory) to perform the next step. Or maybe there is
> no next step.  Perhaps just having a double[][] is fine.

My opinion is that this code should be in a separate IO module.
where the external format can be made more flexible and more
correct (such as not doing unnecessary allocation).

>>> It's also better for modularity, as is
>>> reduces RealMatrixFormat imports (The MatrixUtils supports Field
>>> matrices as well, and I'm attempting to separate real and field
>>> matrices into two difference modules).
>>
>> For modularity, IO should not be in the same module as the core
>> algorithms.
> I agree in general.  I'm sticking all the 'Real' (Excluding Field)
> classes in one module (Vector and Matrix).  AbstractRealMatrix uses
> RealMatrixFormat, so it's tightly coupled ATM and it seems like it
> belongs with the real Vector and Matrix classes so...

Given the major refactoring which you are attempting, why not drop
everything that does not belong?

>>
>>> Also just curious if Array2DRowRealMatrix is worth keeping?  It 
>>> seems
>>> like the performance of BlockRealMatrix might be just as good or
>>> better regardless of matrix size ... although my testing is 
>>> limited.
>>
>> I recall having performed a benchmark years ago and IIRC, the
>> "BlockRealMatrix" started to be more only for very large matrix size
>> (although I don't remember which).
> That was what I was seeing as well.  Once matrix rows reach 100K - 10
> million performance goes up between 2X and 5X, but I did not really
> see any difference for (multiplication only) in performance for small
> data sets.  So I'm assuming, like Luc indicated, that the
> Array2DRowRealMatrix is only better when attempting to reuse the
> underlying double[][] matrix a lot...

As I recall, for "small" matrices, the "Block" version was 
significantly
slower. Depends what we call "large" and "small"...

Gilles

>
> Cheers,
> Ole


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Off-list] Re: [math] RealMatrixFormat.parse()

Posted by Gilles <gi...@harfang.homelinux.org>.
Not really "off-list".  Sorry for the noise...

Gilles

On Fri, 01 Jan 2016 01:41:25 +0100, Gilles wrote:
>>
>> HAPPY NEW YEAR!!
>>
>> Ole
>>
>
> Thanks, Ole.
>
> Best wishes to you too,
> Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[Off-list] Re: [math] RealMatrixFormat.parse()

Posted by Gilles <gi...@harfang.homelinux.org>.
>
> HAPPY NEW YEAR!!
>
> Ole
>

Thanks, Ole.

Best wishes to you too,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] RealMatrixFormat.parse()

Posted by Ole Ersoy <ol...@gmail.com>.

On 12/31/2015 05:42 PM, Gilles wrote:
> On Thu, 31 Dec 2015 12:54:00 -0600, Ole Ersoy wrote:
>> On 12/31/2015 11:10 AM, Gilles wrote:
>>> On Wed, 30 Dec 2015 21:33:56 -0600, Ole Ersoy wrote:
>>>> Hi,
>>>>
>>>> In RealMatrixFormat.parse() MatrixUtils makes the decision on what
>>>> type of RealMatrix instance to return.
>>>
>>> Ideally, this is correct as the actual type is an "implementation detail".
>>>> Flexibility is gained if it
>>>> just returns double[][] letting the caller decide what type of
>>>> RealMatrix instance to create.
>>>
>>> That could become a problem e.g. for sparse matrices where the persistent
>>> format and the instance type could be optimized for space, but a "double[][]"
>>> cannot be.
>> RealMatrixFormat.parse() first creates a double[][] and then it drops
>> it into the Matrix wrapper it thinks is best, per MatrixUtils. By
>> leaving out the last step the caller can either use MatrixUtils (Or
>> hopefully MatrixFactory) to perform the next step. Or maybe there is
>> no next step.  Perhaps just having a double[][] is fine.
>
> My opinion is that this code should be in a separate IO module.
> where the external format can be made more flexible and more
> correct (such as not doing unnecessary allocation).
Totally with you on that.  Ideally something along the lines of MatrixPersist and MatrixParse classes that support localized formatting.  Right now it's all bundled up into RealMatrixFormat...probably due to time constraints.  I'll look at modularizing that part later.  Right I'm breaking up MatrixUtils into MatrixFactory and LinearExceptionFactory, and then once the dust settles I can look at the IO piece in more detail.
>
>>>> It's also better for modularity, as is
>>>> reduces RealMatrixFormat imports (The MatrixUtils supports Field
>>>> matrices as well, and I'm attempting to separate real and field
>>>> matrices into two difference modules).
>>>
>>> For modularity, IO should not be in the same module as the core
>>> algorithms.
>> I agree in general.  I'm sticking all the 'Real' (Excluding Field)
>> classes in one module (Vector and Matrix).  AbstractRealMatrix uses
>> RealMatrixFormat, so it's tightly coupled ATM and it seems like it
>> belongs with the real Vector and Matrix classes so...
>
> Given the major refactoring which you are attempting, why not drop
> everything that does not belong?
Good point.  I'll just strip out the formatting, etc. from AbstractRealMatrix and reintroduce it in the IO module.

>
>>>
>>>> Also just curious if Array2DRowRealMatrix is worth keeping?  It seems
>>>> like the performance of BlockRealMatrix might be just as good or
>>>> better regardless of matrix size ... although my testing is limited.
>>>
>>> I recall having performed a benchmark years ago and IIRC, the
>>> "BlockRealMatrix" started to be more only for very large matrix size
>>> (although I don't remember which).
>> That was what I was seeing as well.  Once matrix rows reach 100K - 10
>> million performance goes up between 2X and 5X, but I did not really
>> see any difference for (multiplication only) in performance for small
>> data sets.  So I'm assuming, like Luc indicated, that the
>> Array2DRowRealMatrix is only better when attempting to reuse the
>> underlying double[][] matrix a lot...
>
> As I recall, for "small" matrices, the "Block" version was significantly
> slower. Depends what we call "large" and "small"...
Hmm - That probably makes sense since Block has to create the block structure.  I'll have a second look once I get a good profiling setup added to the module.

HAPPY NEW YEAR!!

Ole


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org