You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Ole Ersoy <ol...@gmail.com> on 2015/12/31 04:33:56 UTC

[math] RealMatrixFormat.parse()

Hi,

In RealMatrixFormat.parse() MatrixUtils makes the decision on what type of RealMatrix instance to return.  Flexibility is gained if it just returns double[][] letting the caller decide what type of RealMatrix instance to create.  It's also better for modularity, as is reduces RealMatrixFormat imports (The MatrixUtils supports Field matrices as well, and I'm attempting to separate real and field matrices into two difference modules).

Also just curious if Array2DRowRealMatrix is worth keeping?  It seems like the performance of BlockRealMatrix might be just as good or better regardless of matrix size ... although my testing is limited.

Cheers,
Ole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] RealMatrixFormat.parse()

Posted by Ole Ersoy <ol...@gmail.com>.


On 12/31/2015 03:33 AM, Luc Maisonobe wrote:
> Le 31/12/2015 04:33, Ole Ersoy a écrit :
>
[...]
>
> Of course, using this feature is rather expert use. Typically, it is
> done when some algorithm creates the data array by itself, and then
> wants to return it as a matrix, but will not use the array by itself
> anymore. In this case, transfering ownership of the array to the matrix
> instance is not a bad thing, particularly if the array is big.
>
> I agree this case is really specific so it may not be sufficient to keep
> this class (or to keep the constructor and the special getter).

OK - I'll just leave out Array2DRealMatrix for now.  The BlockRealMatrix is a real gem.  Happy New Year!

Cheers,
Ole


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] RealMatrixFormat.parse()

Posted by Luc Maisonobe <lu...@spaceroots.org>.

Le 31/12/2015 04:33, Ole Ersoy a écrit :
> Hi,
> 
> In RealMatrixFormat.parse() MatrixUtils makes the decision on what type
> of RealMatrix instance to return.  Flexibility is gained if it just
> returns double[][] letting the caller decide what type of RealMatrix
> instance to create.  It's also better for modularity, as is reduces
> RealMatrixFormat imports (The MatrixUtils supports Field matrices as
> well, and I'm attempting to separate real and field matrices into two
> difference modules).
> 
> Also just curious if Array2DRowRealMatrix is worth keeping?  It seems
> like the performance of BlockRealMatrix might be just as good or better
> regardless of matrix size ... although my testing is limited.

As far as I am concerned, Array2DRowRealMatrix is used in places where
I want to avoid copying the data. The can occur both at construction
and at data retrieval. See the constructor with the boolean to simply
reuse an allocated array rather, and see the getDataRef getter.

Of course, using this feature is rather expert use. Typically, it is
done when some algorithm creates the data array by itself, and then
wants to return it as a matrix, but will not use the array by itself
anymore. In this case, transfering ownership of the array to the matrix
instance is not a bad thing, particularly if the array is big.

I agree this case is really specific so it may not be sufficient to keep
this class (or to keep the constructor and the special getter).

best regards,
Luc

> 
> Cheers,
> Ole
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Off-list] Re: [math] RealMatrixFormat.parse()

Posted by Gilles <gi...@harfang.homelinux.org>.

Not really "off-list".  Sorry for the noise...

Gilles

On Fri, 01 Jan 2016 01:41:25 +0100, Gilles wrote:
>>
>> HAPPY NEW YEAR!!
>>
>> Ole
>>
>
> Thanks, Ole.
>
> Best wishes to you too,
> Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

[Off-list] Re: [math] RealMatrixFormat.parse()

Posted by Gilles <gi...@harfang.homelinux.org>.

>
> HAPPY NEW YEAR!!
>
> Ole
>

Thanks, Ole.

Best wishes to you too,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] RealMatrixFormat.parse()

Posted by Ole Ersoy <ol...@gmail.com>.


On 12/31/2015 05:42 PM, Gilles wrote:
> On Thu, 31 Dec 2015 12:54:00 -0600, Ole Ersoy wrote:
>> On 12/31/2015 11:10 AM, Gilles wrote:
>>> On Wed, 30 Dec 2015 21:33:56 -0600, Ole Ersoy wrote:
>>>> Hi,
>>>>
>>>> In RealMatrixFormat.parse() MatrixUtils makes the decision on what
>>>> type of RealMatrix instance to return.
>>>
>>> Ideally, this is correct as the actual type is an "implementation detail".
>>>> Flexibility is gained if it
>>>> just returns double[][] letting the caller decide what type of
>>>> RealMatrix instance to create.
>>>
>>> That could become a problem e.g. for sparse matrices where the persistent
>>> format and the instance type could be optimized for space, but a "double[][]"
>>> cannot be.
>> RealMatrixFormat.parse() first creates a double[][] and then it drops
>> it into the Matrix wrapper it thinks is best, per MatrixUtils. By
>> leaving out the last step the caller can either use MatrixUtils (Or
>> hopefully MatrixFactory) to perform the next step. Or maybe there is
>> no next step.  Perhaps just having a double[][] is fine.
>
> My opinion is that this code should be in a separate IO module.
> where the external format can be made more flexible and more
> correct (such as not doing unnecessary allocation).
Totally with you on that.  Ideally something along the lines of MatrixPersist and MatrixParse classes that support localized formatting.  Right now it's all bundled up into RealMatrixFormat...probably due to time constraints.  I'll look at modularizing that part later.  Right I'm breaking up MatrixUtils into MatrixFactory and LinearExceptionFactory, and then once the dust settles I can look at the IO piece in more detail.
>
>>>> It's also better for modularity, as is
>>>> reduces RealMatrixFormat imports (The MatrixUtils supports Field
>>>> matrices as well, and I'm attempting to separate real and field
>>>> matrices into two difference modules).
>>>
>>> For modularity, IO should not be in the same module as the core
>>> algorithms.
>> I agree in general.  I'm sticking all the 'Real' (Excluding Field)
>> classes in one module (Vector and Matrix).  AbstractRealMatrix uses
>> RealMatrixFormat, so it's tightly coupled ATM and it seems like it
>> belongs with the real Vector and Matrix classes so...
>
> Given the major refactoring which you are attempting, why not drop
> everything that does not belong?
Good point.  I'll just strip out the formatting, etc. from AbstractRealMatrix and reintroduce it in the IO module.

>
>>>
>>>> Also just curious if Array2DRowRealMatrix is worth keeping?  It seems
>>>> like the performance of BlockRealMatrix might be just as good or
>>>> better regardless of matrix size ... although my testing is limited.
>>>
>>> I recall having performed a benchmark years ago and IIRC, the
>>> "BlockRealMatrix" started to be more only for very large matrix size
>>> (although I don't remember which).
>> That was what I was seeing as well.  Once matrix rows reach 100K - 10
>> million performance goes up between 2X and 5X, but I did not really
>> see any difference for (multiplication only) in performance for small
>> data sets.  So I'm assuming, like Luc indicated, that the
>> Array2DRowRealMatrix is only better when attempting to reuse the
>> underlying double[][] matrix a lot...
>
> As I recall, for "small" matrices, the "Block" version was significantly
> slower. Depends what we call "large" and "small"...
Hmm - That probably makes sense since Block has to create the block structure.  I'll have a second look once I get a good profiling setup added to the module.

HAPPY NEW YEAR!!

Ole


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] RealMatrixFormat.parse()

Posted by Gilles <gi...@harfang.homelinux.org>.

On Thu, 31 Dec 2015 12:54:00 -0600, Ole Ersoy wrote:
> On 12/31/2015 11:10 AM, Gilles wrote:
>> On Wed, 30 Dec 2015 21:33:56 -0600, Ole Ersoy wrote:
>>> Hi,
>>>
>>> In RealMatrixFormat.parse() MatrixUtils makes the decision on what
>>> type of RealMatrix instance to return.
>>
>> Ideally, this is correct as the actual type is an "implementation 
>> detail".
>>> Flexibility is gained if it
>>> just returns double[][] letting the caller decide what type of
>>> RealMatrix instance to create.
>>
>> That could become a problem e.g. for sparse matrices where the 
>> persistent
>> format and the instance type could be optimized for space, but a 
>> "double[][]"
>> cannot be.
> RealMatrixFormat.parse() first creates a double[][] and then it drops
> it into the Matrix wrapper it thinks is best, per MatrixUtils.  By
> leaving out the last step the caller can either use MatrixUtils (Or
> hopefully MatrixFactory) to perform the next step. Or maybe there is
> no next step.  Perhaps just having a double[][] is fine.

My opinion is that this code should be in a separate IO module.
where the external format can be made more flexible and more
correct (such as not doing unnecessary allocation).

>>> It's also better for modularity, as is
>>> reduces RealMatrixFormat imports (The MatrixUtils supports Field
>>> matrices as well, and I'm attempting to separate real and field
>>> matrices into two difference modules).
>>
>> For modularity, IO should not be in the same module as the core
>> algorithms.
> I agree in general.  I'm sticking all the 'Real' (Excluding Field)
> classes in one module (Vector and Matrix).  AbstractRealMatrix uses
> RealMatrixFormat, so it's tightly coupled ATM and it seems like it
> belongs with the real Vector and Matrix classes so...

Given the major refactoring which you are attempting, why not drop
everything that does not belong?

>>
>>> Also just curious if Array2DRowRealMatrix is worth keeping?  It 
>>> seems
>>> like the performance of BlockRealMatrix might be just as good or
>>> better regardless of matrix size ... although my testing is 
>>> limited.
>>
>> I recall having performed a benchmark years ago and IIRC, the
>> "BlockRealMatrix" started to be more only for very large matrix size
>> (although I don't remember which).
> That was what I was seeing as well.  Once matrix rows reach 100K - 10
> million performance goes up between 2X and 5X, but I did not really
> see any difference for (multiplication only) in performance for small
> data sets.  So I'm assuming, like Luc indicated, that the
> Array2DRowRealMatrix is only better when attempting to reuse the
> underlying double[][] matrix a lot...

As I recall, for "small" matrices, the "Block" version was 
significantly
slower. Depends what we call "large" and "small"...

Gilles

>
> Cheers,
> Ole


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] RealMatrixFormat.parse()

Posted by Ole Ersoy <ol...@gmail.com>.


On 12/31/2015 11:10 AM, Gilles wrote:
> On Wed, 30 Dec 2015 21:33:56 -0600, Ole Ersoy wrote:
>> Hi,
>>
>> In RealMatrixFormat.parse() MatrixUtils makes the decision on what
>> type of RealMatrix instance to return.
>
> Ideally, this is correct as the actual type is an "implementation detail".
>> Flexibility is gained if it
>> just returns double[][] letting the caller decide what type of
>> RealMatrix instance to create.
>
> That could become a problem e.g. for sparse matrices where the persistent
> format and the instance type could be optimized for space, but a "double[][]"
> cannot be.
RealMatrixFormat.parse() first creates a double[][] and then it drops it into the Matrix wrapper it thinks is best, per MatrixUtils.  By leaving out the last step the caller can either use MatrixUtils (Or hopefully MatrixFactory) to perform the next step. Or maybe there is no next step.  Perhaps just having a double[][] is fine.
>
>> It's also better for modularity, as is
>> reduces RealMatrixFormat imports (The MatrixUtils supports Field
>> matrices as well, and I'm attempting to separate real and field
>> matrices into two difference modules).
>
> For modularity, IO should not be in the same module as the core
> algorithms.
I agree in general.  I'm sticking all the 'Real' (Excluding Field) classes in one module (Vector and Matrix).  AbstractRealMatrix uses RealMatrixFormat, so it's tightly coupled ATM and it seems like it belongs with the real Vector and Matrix classes so...

>
>> Also just curious if Array2DRowRealMatrix is worth keeping?  It seems
>> like the performance of BlockRealMatrix might be just as good or
>> better regardless of matrix size ... although my testing is limited.
>
> I recall having performed a benchmark years ago and IIRC, the
> "BlockRealMatrix" started to be more only for very large matrix size
> (although I don't remember which).
That was what I was seeing as well.  Once matrix rows reach 100K - 10 million performance goes up between 2X and 5X, but I did not really see any difference for (multiplication only) in performance for small data sets.  So I'm assuming, like Luc indicated, that the Array2DRowRealMatrix is only better when attempting to reuse the underlying double[][] matrix a lot...

Cheers,
Ole


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] RealMatrixFormat.parse()

Posted by Gilles <gi...@harfang.homelinux.org>.

On Wed, 30 Dec 2015 21:33:56 -0600, Ole Ersoy wrote:
> Hi,
>
> In RealMatrixFormat.parse() MatrixUtils makes the decision on what
> type of RealMatrix instance to return.

Ideally, this is correct as the actual type is an "implementation 
detail".

> Flexibility is gained if it
> just returns double[][] letting the caller decide what type of
> RealMatrix instance to create.

That could become a problem e.g. for sparse matrices where the 
persistent
format and the instance type could be optimized for space, but a 
"double[][]"
cannot be.

> It's also better for modularity, as is
> reduces RealMatrixFormat imports (The MatrixUtils supports Field
> matrices as well, and I'm attempting to separate real and field
> matrices into two difference modules).

For modularity, IO should not be in the same module as the core
algorithms.

> Also just curious if Array2DRowRealMatrix is worth keeping?  It seems
> like the performance of BlockRealMatrix might be just as good or
> better regardless of matrix size ... although my testing is limited.

I recall having performed a benchmark years ago and IIRC, the
"BlockRealMatrix" started to be more only for very large matrix size
(although I don't remember which).

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org