You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by Deron Eriksson <de...@gmail.com> on 2016/02/16 00:45:36 UTC

Matrix Market format with metadata file

Hi,

The Matrix Market coordinate format contains # rows, # columns, and #
non-zero values as metadata near the top of a matrix data file.

If I write a matrix in mm format using SystemML, no metadata file is
created since the metadata is stored within the data file.

However, when reading a matrix with mm format, I can supply a metadata
file, even though metadata exists in the matrix data file. Is there any
reason for this, or should this be disallowed since the metadata file is
redundant and can cause confusion, since metadata values can then be
specified in two places, which then brings up the question, "which metadata
value should be used"?

Deron

Re: Matrix Market format with metadata file

Posted by Deron Eriksson <de...@gmail.com>.
Thank you, Shirish. That makes sense. I'll update the docs to include this
information.

Deron


On Mon, Feb 15, 2016 at 4:26 PM, Shirish Tatikonda <
shirish.tatikonda@gmail.com> wrote:

> Both "mm" and "text" formats are identical except for a couple of
> differences:
>
> 1) for "mm": the matrix metadata is included in the first two lines; and
> for "text": the metadata is present in the associated .mtd file
> 2) "mm" data must be in a single file (i.e., no *part* files) where "text"
> data can span multiple *part* files (like any other file on HDFS).
>
> The support for "mm" is created mainly for the purpose of
> importing/exporting data in the format that R likes.
>
> Shirish
>
> On Mon, Feb 15, 2016 at 4:17 PM, Deron Eriksson <de...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I have a question with regards to text vs mm. Isn't the mm coordinate
> > format identical to the text format but the mm data file happens to
> include
> > the metadata line for rows, cols, and nnzs, so shouldn't they scale the
> > same since the text row values (i,j,v) correspond to the mm rows?
> >
> > If we have the following MM:
> > %%MatrixMarket matrix coordinate real general
> > 4 3 6
> > 1 1 1.0
> > 1 2 2.0
> > 1 3 3.0
> > 3 1 7.0
> > 3 2 8.0
> > 3 3 9.0
> >
> > The corresponding text format (with accompanying metadata file) is:
> > 1 1 1.0
> > 1 2 2.0
> > 1 3 3.0
> > 3 1 7.0
> > 3 2 8.0
> > 3 3 9.0
> >
> > So aren't these formats essentially the same?
> >
> > Deron
> >
> >
> > On Mon, Feb 15, 2016 at 3:56 PM, Matthias Boehm <mb...@us.ibm.com>
> wrote:
> >
> > > The meta data file is still useful in order to get the format. In case
> of
> > > matrix market, errors will be raised if included meta data is
> > inconsistent.
> > > So no, we should not disallow to specify the meta data. In general, we
> > > anyway recommend using text (textcell) instead mm (matrix market) for
> > > scalability reasons.
> > >
> > > Regards,
> > > Matthias
> > >
> > > [image: Inactive hide details for Deron Eriksson ---02/15/2016 03:45:46
> > > PM---Hi, The Matrix Market coordinate format contains # rows, #]Deron
> > > Eriksson ---02/15/2016 03:45:46 PM---Hi, The Matrix Market coordinate
> > > format contains # rows, # columns, and #
> > >
> > > From: Deron Eriksson <de...@gmail.com>
> > > To: dev@systemml.incubator.apache.org
> > > Date: 02/15/2016 03:45 PM
> > > Subject: Matrix Market format with metadata file
> > > ------------------------------
> > >
> > >
> > >
> > > Hi,
> > >
> > > The Matrix Market coordinate format contains # rows, # columns, and #
> > > non-zero values as metadata near the top of a matrix data file.
> > >
> > > If I write a matrix in mm format using SystemML, no metadata file is
> > > created since the metadata is stored within the data file.
> > >
> > > However, when reading a matrix with mm format, I can supply a metadata
> > > file, even though metadata exists in the matrix data file. Is there any
> > > reason for this, or should this be disallowed since the metadata file
> is
> > > redundant and can cause confusion, since metadata values can then be
> > > specified in two places, which then brings up the question, "which
> > metadata
> > > value should be used"?
> > >
> > > Deron
> > >
> > >
> > >
> >
>

Re: Matrix Market format with metadata file

Posted by Shirish Tatikonda <sh...@gmail.com>.
Ok. Cool.

On Mon, Feb 15, 2016 at 4:57 PM, Deron Eriksson <de...@gmail.com>
wrote:

> Very good eye! I used "m = matrix("1 2 3 0 0 0 7 8 9 0 0 0", rows=4,
> cols=3)" to generate the mm file, so the 4th row did indeed contain all
> zeros.
>
>
> On Mon, Feb 15, 2016 at 4:50 PM, Shirish Tatikonda <
> shirish.tatikonda@gmail.com> wrote:
>
> > Btw (Just to be precise), in your example of "mm" file.. the metadata is
> "4
> > 3 6" but the following non-zero values are only up to row number 3. So,
> > either it was a typo or the 4th row contains all zeros.
> >
> >
> >
> > On Mon, Feb 15, 2016 at 4:26 PM, Shirish Tatikonda <
> > shirish.tatikonda@gmail.com> wrote:
> >
> > > Both "mm" and "text" formats are identical except for a couple of
> > > differences:
> > >
> > > 1) for "mm": the matrix metadata is included in the first two lines;
> and
> > > for "text": the metadata is present in the associated .mtd file
> > > 2) "mm" data must be in a single file (i.e., no *part* files) where
> > > "text" data can span multiple *part* files (like any other file on
> HDFS).
> > >
> > > The support for "mm" is created mainly for the purpose of
> > > importing/exporting data in the format that R likes.
> > >
> > > Shirish
> > >
> > > On Mon, Feb 15, 2016 at 4:17 PM, Deron Eriksson <
> deroneriksson@gmail.com
> > >
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I have a question with regards to text vs mm. Isn't the mm coordinate
> > >> format identical to the text format but the mm data file happens to
> > >> include
> > >> the metadata line for rows, cols, and nnzs, so shouldn't they scale
> the
> > >> same since the text row values (i,j,v) correspond to the mm rows?
> > >>
> > >> If we have the following MM:
> > >> %%MatrixMarket matrix coordinate real general
> > >> 4 3 6
> > >> 1 1 1.0
> > >> 1 2 2.0
> > >> 1 3 3.0
> > >> 3 1 7.0
> > >> 3 2 8.0
> > >> 3 3 9.0
> > >>
> > >> The corresponding text format (with accompanying metadata file) is:
> > >> 1 1 1.0
> > >> 1 2 2.0
> > >> 1 3 3.0
> > >> 3 1 7.0
> > >> 3 2 8.0
> > >> 3 3 9.0
> > >>
> > >> So aren't these formats essentially the same?
> > >>
> > >> Deron
> > >>
> > >>
> > >> On Mon, Feb 15, 2016 at 3:56 PM, Matthias Boehm <mb...@us.ibm.com>
> > >> wrote:
> > >>
> > >> > The meta data file is still useful in order to get the format. In
> case
> > >> of
> > >> > matrix market, errors will be raised if included meta data is
> > >> inconsistent.
> > >> > So no, we should not disallow to specify the meta data. In general,
> we
> > >> > anyway recommend using text (textcell) instead mm (matrix market)
> for
> > >> > scalability reasons.
> > >> >
> > >> > Regards,
> > >> > Matthias
> > >> >
> > >> > [image: Inactive hide details for Deron Eriksson ---02/15/2016
> > 03:45:46
> > >> > PM---Hi, The Matrix Market coordinate format contains # rows,
> #]Deron
> > >> > Eriksson ---02/15/2016 03:45:46 PM---Hi, The Matrix Market
> coordinate
> > >> > format contains # rows, # columns, and #
> > >> >
> > >> > From: Deron Eriksson <de...@gmail.com>
> > >> > To: dev@systemml.incubator.apache.org
> > >> > Date: 02/15/2016 03:45 PM
> > >> > Subject: Matrix Market format with metadata file
> > >> > ------------------------------
> > >> >
> > >> >
> > >> >
> > >> > Hi,
> > >> >
> > >> > The Matrix Market coordinate format contains # rows, # columns, and
> #
> > >> > non-zero values as metadata near the top of a matrix data file.
> > >> >
> > >> > If I write a matrix in mm format using SystemML, no metadata file is
> > >> > created since the metadata is stored within the data file.
> > >> >
> > >> > However, when reading a matrix with mm format, I can supply a
> metadata
> > >> > file, even though metadata exists in the matrix data file. Is there
> > any
> > >> > reason for this, or should this be disallowed since the metadata
> file
> > is
> > >> > redundant and can cause confusion, since metadata values can then be
> > >> > specified in two places, which then brings up the question, "which
> > >> metadata
> > >> > value should be used"?
> > >> >
> > >> > Deron
> > >> >
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Matrix Market format with metadata file

Posted by Deron Eriksson <de...@gmail.com>.
Very good eye! I used "m = matrix("1 2 3 0 0 0 7 8 9 0 0 0", rows=4,
cols=3)" to generate the mm file, so the 4th row did indeed contain all
zeros.


On Mon, Feb 15, 2016 at 4:50 PM, Shirish Tatikonda <
shirish.tatikonda@gmail.com> wrote:

> Btw (Just to be precise), in your example of "mm" file.. the metadata is "4
> 3 6" but the following non-zero values are only up to row number 3. So,
> either it was a typo or the 4th row contains all zeros.
>
>
>
> On Mon, Feb 15, 2016 at 4:26 PM, Shirish Tatikonda <
> shirish.tatikonda@gmail.com> wrote:
>
> > Both "mm" and "text" formats are identical except for a couple of
> > differences:
> >
> > 1) for "mm": the matrix metadata is included in the first two lines; and
> > for "text": the metadata is present in the associated .mtd file
> > 2) "mm" data must be in a single file (i.e., no *part* files) where
> > "text" data can span multiple *part* files (like any other file on HDFS).
> >
> > The support for "mm" is created mainly for the purpose of
> > importing/exporting data in the format that R likes.
> >
> > Shirish
> >
> > On Mon, Feb 15, 2016 at 4:17 PM, Deron Eriksson <deroneriksson@gmail.com
> >
> > wrote:
> >
> >> Hi,
> >>
> >> I have a question with regards to text vs mm. Isn't the mm coordinate
> >> format identical to the text format but the mm data file happens to
> >> include
> >> the metadata line for rows, cols, and nnzs, so shouldn't they scale the
> >> same since the text row values (i,j,v) correspond to the mm rows?
> >>
> >> If we have the following MM:
> >> %%MatrixMarket matrix coordinate real general
> >> 4 3 6
> >> 1 1 1.0
> >> 1 2 2.0
> >> 1 3 3.0
> >> 3 1 7.0
> >> 3 2 8.0
> >> 3 3 9.0
> >>
> >> The corresponding text format (with accompanying metadata file) is:
> >> 1 1 1.0
> >> 1 2 2.0
> >> 1 3 3.0
> >> 3 1 7.0
> >> 3 2 8.0
> >> 3 3 9.0
> >>
> >> So aren't these formats essentially the same?
> >>
> >> Deron
> >>
> >>
> >> On Mon, Feb 15, 2016 at 3:56 PM, Matthias Boehm <mb...@us.ibm.com>
> >> wrote:
> >>
> >> > The meta data file is still useful in order to get the format. In case
> >> of
> >> > matrix market, errors will be raised if included meta data is
> >> inconsistent.
> >> > So no, we should not disallow to specify the meta data. In general, we
> >> > anyway recommend using text (textcell) instead mm (matrix market) for
> >> > scalability reasons.
> >> >
> >> > Regards,
> >> > Matthias
> >> >
> >> > [image: Inactive hide details for Deron Eriksson ---02/15/2016
> 03:45:46
> >> > PM---Hi, The Matrix Market coordinate format contains # rows, #]Deron
> >> > Eriksson ---02/15/2016 03:45:46 PM---Hi, The Matrix Market coordinate
> >> > format contains # rows, # columns, and #
> >> >
> >> > From: Deron Eriksson <de...@gmail.com>
> >> > To: dev@systemml.incubator.apache.org
> >> > Date: 02/15/2016 03:45 PM
> >> > Subject: Matrix Market format with metadata file
> >> > ------------------------------
> >> >
> >> >
> >> >
> >> > Hi,
> >> >
> >> > The Matrix Market coordinate format contains # rows, # columns, and #
> >> > non-zero values as metadata near the top of a matrix data file.
> >> >
> >> > If I write a matrix in mm format using SystemML, no metadata file is
> >> > created since the metadata is stored within the data file.
> >> >
> >> > However, when reading a matrix with mm format, I can supply a metadata
> >> > file, even though metadata exists in the matrix data file. Is there
> any
> >> > reason for this, or should this be disallowed since the metadata file
> is
> >> > redundant and can cause confusion, since metadata values can then be
> >> > specified in two places, which then brings up the question, "which
> >> metadata
> >> > value should be used"?
> >> >
> >> > Deron
> >> >
> >> >
> >> >
> >>
> >
> >
>

Re: Matrix Market format with metadata file

Posted by Shirish Tatikonda <sh...@gmail.com>.
Btw (Just to be precise), in your example of "mm" file.. the metadata is "4
3 6" but the following non-zero values are only up to row number 3. So,
either it was a typo or the 4th row contains all zeros.



On Mon, Feb 15, 2016 at 4:26 PM, Shirish Tatikonda <
shirish.tatikonda@gmail.com> wrote:

> Both "mm" and "text" formats are identical except for a couple of
> differences:
>
> 1) for "mm": the matrix metadata is included in the first two lines; and
> for "text": the metadata is present in the associated .mtd file
> 2) "mm" data must be in a single file (i.e., no *part* files) where
> "text" data can span multiple *part* files (like any other file on HDFS).
>
> The support for "mm" is created mainly for the purpose of
> importing/exporting data in the format that R likes.
>
> Shirish
>
> On Mon, Feb 15, 2016 at 4:17 PM, Deron Eriksson <de...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a question with regards to text vs mm. Isn't the mm coordinate
>> format identical to the text format but the mm data file happens to
>> include
>> the metadata line for rows, cols, and nnzs, so shouldn't they scale the
>> same since the text row values (i,j,v) correspond to the mm rows?
>>
>> If we have the following MM:
>> %%MatrixMarket matrix coordinate real general
>> 4 3 6
>> 1 1 1.0
>> 1 2 2.0
>> 1 3 3.0
>> 3 1 7.0
>> 3 2 8.0
>> 3 3 9.0
>>
>> The corresponding text format (with accompanying metadata file) is:
>> 1 1 1.0
>> 1 2 2.0
>> 1 3 3.0
>> 3 1 7.0
>> 3 2 8.0
>> 3 3 9.0
>>
>> So aren't these formats essentially the same?
>>
>> Deron
>>
>>
>> On Mon, Feb 15, 2016 at 3:56 PM, Matthias Boehm <mb...@us.ibm.com>
>> wrote:
>>
>> > The meta data file is still useful in order to get the format. In case
>> of
>> > matrix market, errors will be raised if included meta data is
>> inconsistent.
>> > So no, we should not disallow to specify the meta data. In general, we
>> > anyway recommend using text (textcell) instead mm (matrix market) for
>> > scalability reasons.
>> >
>> > Regards,
>> > Matthias
>> >
>> > [image: Inactive hide details for Deron Eriksson ---02/15/2016 03:45:46
>> > PM---Hi, The Matrix Market coordinate format contains # rows, #]Deron
>> > Eriksson ---02/15/2016 03:45:46 PM---Hi, The Matrix Market coordinate
>> > format contains # rows, # columns, and #
>> >
>> > From: Deron Eriksson <de...@gmail.com>
>> > To: dev@systemml.incubator.apache.org
>> > Date: 02/15/2016 03:45 PM
>> > Subject: Matrix Market format with metadata file
>> > ------------------------------
>> >
>> >
>> >
>> > Hi,
>> >
>> > The Matrix Market coordinate format contains # rows, # columns, and #
>> > non-zero values as metadata near the top of a matrix data file.
>> >
>> > If I write a matrix in mm format using SystemML, no metadata file is
>> > created since the metadata is stored within the data file.
>> >
>> > However, when reading a matrix with mm format, I can supply a metadata
>> > file, even though metadata exists in the matrix data file. Is there any
>> > reason for this, or should this be disallowed since the metadata file is
>> > redundant and can cause confusion, since metadata values can then be
>> > specified in two places, which then brings up the question, "which
>> metadata
>> > value should be used"?
>> >
>> > Deron
>> >
>> >
>> >
>>
>
>

Re: Matrix Market format with metadata file

Posted by Shirish Tatikonda <sh...@gmail.com>.
Both "mm" and "text" formats are identical except for a couple of
differences:

1) for "mm": the matrix metadata is included in the first two lines; and
for "text": the metadata is present in the associated .mtd file
2) "mm" data must be in a single file (i.e., no *part* files) where "text"
data can span multiple *part* files (like any other file on HDFS).

The support for "mm" is created mainly for the purpose of
importing/exporting data in the format that R likes.

Shirish

On Mon, Feb 15, 2016 at 4:17 PM, Deron Eriksson <de...@gmail.com>
wrote:

> Hi,
>
> I have a question with regards to text vs mm. Isn't the mm coordinate
> format identical to the text format but the mm data file happens to include
> the metadata line for rows, cols, and nnzs, so shouldn't they scale the
> same since the text row values (i,j,v) correspond to the mm rows?
>
> If we have the following MM:
> %%MatrixMarket matrix coordinate real general
> 4 3 6
> 1 1 1.0
> 1 2 2.0
> 1 3 3.0
> 3 1 7.0
> 3 2 8.0
> 3 3 9.0
>
> The corresponding text format (with accompanying metadata file) is:
> 1 1 1.0
> 1 2 2.0
> 1 3 3.0
> 3 1 7.0
> 3 2 8.0
> 3 3 9.0
>
> So aren't these formats essentially the same?
>
> Deron
>
>
> On Mon, Feb 15, 2016 at 3:56 PM, Matthias Boehm <mb...@us.ibm.com> wrote:
>
> > The meta data file is still useful in order to get the format. In case of
> > matrix market, errors will be raised if included meta data is
> inconsistent.
> > So no, we should not disallow to specify the meta data. In general, we
> > anyway recommend using text (textcell) instead mm (matrix market) for
> > scalability reasons.
> >
> > Regards,
> > Matthias
> >
> > [image: Inactive hide details for Deron Eriksson ---02/15/2016 03:45:46
> > PM---Hi, The Matrix Market coordinate format contains # rows, #]Deron
> > Eriksson ---02/15/2016 03:45:46 PM---Hi, The Matrix Market coordinate
> > format contains # rows, # columns, and #
> >
> > From: Deron Eriksson <de...@gmail.com>
> > To: dev@systemml.incubator.apache.org
> > Date: 02/15/2016 03:45 PM
> > Subject: Matrix Market format with metadata file
> > ------------------------------
> >
> >
> >
> > Hi,
> >
> > The Matrix Market coordinate format contains # rows, # columns, and #
> > non-zero values as metadata near the top of a matrix data file.
> >
> > If I write a matrix in mm format using SystemML, no metadata file is
> > created since the metadata is stored within the data file.
> >
> > However, when reading a matrix with mm format, I can supply a metadata
> > file, even though metadata exists in the matrix data file. Is there any
> > reason for this, or should this be disallowed since the metadata file is
> > redundant and can cause confusion, since metadata values can then be
> > specified in two places, which then brings up the question, "which
> metadata
> > value should be used"?
> >
> > Deron
> >
> >
> >
>

Re: Matrix Market format with metadata file

Posted by Deron Eriksson <de...@gmail.com>.
Hi,

I have a question with regards to text vs mm. Isn't the mm coordinate
format identical to the text format but the mm data file happens to include
the metadata line for rows, cols, and nnzs, so shouldn't they scale the
same since the text row values (i,j,v) correspond to the mm rows?

If we have the following MM:
%%MatrixMarket matrix coordinate real general
4 3 6
1 1 1.0
1 2 2.0
1 3 3.0
3 1 7.0
3 2 8.0
3 3 9.0

The corresponding text format (with accompanying metadata file) is:
1 1 1.0
1 2 2.0
1 3 3.0
3 1 7.0
3 2 8.0
3 3 9.0

So aren't these formats essentially the same?

Deron


On Mon, Feb 15, 2016 at 3:56 PM, Matthias Boehm <mb...@us.ibm.com> wrote:

> The meta data file is still useful in order to get the format. In case of
> matrix market, errors will be raised if included meta data is inconsistent.
> So no, we should not disallow to specify the meta data. In general, we
> anyway recommend using text (textcell) instead mm (matrix market) for
> scalability reasons.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Deron Eriksson ---02/15/2016 03:45:46
> PM---Hi, The Matrix Market coordinate format contains # rows, #]Deron
> Eriksson ---02/15/2016 03:45:46 PM---Hi, The Matrix Market coordinate
> format contains # rows, # columns, and #
>
> From: Deron Eriksson <de...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 02/15/2016 03:45 PM
> Subject: Matrix Market format with metadata file
> ------------------------------
>
>
>
> Hi,
>
> The Matrix Market coordinate format contains # rows, # columns, and #
> non-zero values as metadata near the top of a matrix data file.
>
> If I write a matrix in mm format using SystemML, no metadata file is
> created since the metadata is stored within the data file.
>
> However, when reading a matrix with mm format, I can supply a metadata
> file, even though metadata exists in the matrix data file. Is there any
> reason for this, or should this be disallowed since the metadata file is
> redundant and can cause confusion, since metadata values can then be
> specified in two places, which then brings up the question, "which metadata
> value should be used"?
>
> Deron
>
>
>

Re: Matrix Market format with metadata file

Posted by Matthias Boehm <mb...@us.ibm.com>.
The meta data file is still useful in order to get the format. In case of
matrix market, errors will be raised if included meta data is inconsistent.
So no, we should not disallow to specify the meta data. In general, we
anyway recommend using text (textcell) instead mm (matrix market) for
scalability reasons.

Regards,
Matthias



From:	Deron Eriksson <de...@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	02/15/2016 03:45 PM
Subject:	Matrix Market format with metadata file



Hi,

The Matrix Market coordinate format contains # rows, # columns, and #
non-zero values as metadata near the top of a matrix data file.

If I write a matrix in mm format using SystemML, no metadata file is
created since the metadata is stored within the data file.

However, when reading a matrix with mm format, I can supply a metadata
file, even though metadata exists in the matrix data file. Is there any
reason for this, or should this be disallowed since the metadata file is
redundant and can cause confusion, since metadata values can then be
specified in two places, which then brings up the question, "which metadata
value should be used"?

Deron