You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2011/03/01 09:12:07 UTC

Mahalanobis users out there?

Does anybody use the Mahalanobis distance measure class? If so, what 
for? And how do you prepare the input matrices?

Lance

Re: Mahalanobis users out there?

Posted by Vasil Vasilev <va...@gmail.com>.
Hi,

I provided a newer version of the fix + 2 tests for verification. Can
someone take a look?

Regards, Vasil

On Sun, Mar 6, 2011 at 7:44 PM, Ted Dunning <te...@gmail.com> wrote:

> Good fellow!
>
> I will take a quick look.
>
>
> On Sun, Mar 6, 2011 at 5:15 AM, Vasil Vasilev <va...@gmail.com> wrote:
>
>> Hi Ted,
>>
>> The code above is an example how to use MahalanobisDistanceMeasure. About
>> the problems that I came upon I created Jira and attached a patch to it:
>> https://issues.apache.org/jira/browse/MAHOUT-616
>>
>> Regards, Vasil
>>
>>
>> On Tue, Mar 1, 2011 at 7:26 PM, Ted Dunning <te...@gmail.com>wrote:
>>
>>> Vasil,
>>>
>>> If you are suggesting a change in Mahout, can you to to to
>>> https://issues.apache.org/jira/browse/MAHOUT
>>>  <https://issues.apache.org/jira/browse/MAHOUT>and file an issue with a
>>> patch?
>>>
>>> In case the terminology is new for you, an issue is a bug report or
>>> enhancement request and a patch is
>>> the output of svn diff or git format-patch.
>>>
>>> You can get more information about this process here:
>>> https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute
>>>
>>>
>>> On Tue, Mar 1, 2011 at 1:11 AM, Vasil Vasilev <va...@gmail.com>wrote:
>>>
>>>> Hi Lance,
>>>>
>>>> I did a small test with the Mahalanobis Distance Measure and Dirichlet
>>>> clustering. Unfortunately it was not very successful at the first time,
>>>> because its "configure" method was never called.
>>>> I did some changes in the Mahout code to be able to run it and used the
>>>> following code in the
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job class:
>>>>
>>>> /**
>>>>   * Run the job using supplied arguments, deleting the output directory
>>>> if
>>>> it exists beforehand
>>>>   *
>>>>   * @param input
>>>>   *          the directory pathname for input points
>>>>   * @param output
>>>>   *          the directory pathname for output points
>>>>   * @param modelDistribution
>>>>   *          the ModelDistribution
>>>>   * @param numModels
>>>>   *          the number of Models
>>>>   * @param maxIterations
>>>>   *          the maximum number of iterations
>>>>   * @param alpha0
>>>>   *          the alpha0 value for the DirichletDistribution
>>>>   */
>>>>  public void run(Path input,
>>>>                  Path output,
>>>>                  ModelDistribution<VectorWritable> modelDistribution,
>>>>                  int numModels,
>>>>                  int maxIterations,
>>>>                  double alpha0,
>>>>                  boolean emitMostLikely,
>>>>                  double threshold)
>>>>    throws IOException, ClassNotFoundException, InstantiationException,
>>>> IllegalAccessException,
>>>>           SecurityException, InterruptedException {
>>>>      Configuration conf = new Configuration();
>>>>
>>>>      if(modelDistribution instanceof DistanceMeasureClusterDistribution)
>>>>        {
>>>>            DistanceMeasure measure =
>>>> ((DistanceMeasureClusterDistribution)modelDistribution).getMeasure();
>>>>            if(measure instanceof MahalanobisDistanceMeasure)
>>>>            {
>>>>                Vector meanVector = new DenseVector(new double [] {0.0,
>>>> 22.0, 25.0});
>>>>
>>>> ((MahalanobisDistanceMeasure)measure).setMeanVector(meanVector);
>>>>                Matrix m= new DenseMatrix(new double [][] {{1.0, 0.0,
>>>> 0.0},
>>>> {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}});
>>>>
>>>> ((MahalanobisDistanceMeasure)measure).setCovarianceMatrix(m);
>>>>
>>>>                Path inverseCovarianceFile = new
>>>> Path("output/MahalanobisDistanceMeasureInverseCovarianceFile");
>>>>
>>>>  conf.set("MahalanobisDistanceMeasure.inverseCovarianceFile",
>>>> "output/MahalanobisDistanceMeasureInverseCovarianceFile");
>>>>                FileSystem fs =
>>>> FileSystem.get(inverseCovarianceFile.toUri(), conf);
>>>>                MatrixWritable inverseCovarianceMatrix = new
>>>>
>>>> MatrixWritable(((MahalanobisDistanceMeasure)measure).getInverseCovarianceMatrix());
>>>>                DataOutputStream out = fs.create(inverseCovarianceFile);
>>>>                try {
>>>>                  inverseCovarianceMatrix.write(out);
>>>>                } finally {
>>>>                    out.close();
>>>>                }
>>>>
>>>>                Path meanVectorFile = new
>>>> Path("output/MahalanobisDistanceMeasureMeanVectorFile");
>>>>                conf.set("MahalanobisDistanceMeasure.meanVectorFile",
>>>> "output/MahalanobisDistanceMeasureMeanVectorFile");
>>>>                fs = FileSystem.get(meanVectorFile.toUri(), conf);
>>>>                VectorWritable meanVectorWritable = new
>>>> VectorWritable(meanVector);
>>>>                out = fs.create(meanVectorFile);
>>>>                try {
>>>>                    meanVectorWritable.write(out);
>>>>                } finally {
>>>>                    out.close();
>>>>                }
>>>>
>>>>                conf.set("MahalanobisDistanceMeasure.maxtrixClass",
>>>> MatrixWritable.class.getName());
>>>>                conf.set("MahalanobisDistanceMeasure.vectorClass",
>>>> VectorWritable.class.getName());
>>>>            }
>>>>        }
>>>>
>>>>    Path directoryContainingConvertedInput = new Path(output,
>>>> DIRECTORY_CONTAINING_CONVERTED_INPUT);
>>>>    SynthInputDriver.runJob(input, directoryContainingConvertedInput,
>>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>>>    //InputDriver.runJob(input, directoryContainingConvertedInput,
>>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>>>    DirichletDriver.run(conf, directoryContainingConvertedInput,
>>>>                        output,
>>>>                        modelDistribution,
>>>>                        numModels,
>>>>                        maxIterations,
>>>>                        alpha0,
>>>>                        true,
>>>>                        emitMostLikely,
>>>>                        threshold,
>>>>                        true);
>>>>
>>>>    try {
>>>>
>>>>
>>>> ClusteredPointsConverter.convertClusteredPoints(directoryContainingConvertedInput,
>>>> new Path(output, "clusteredPoints"),  new Path(output,
>>>> "convertedClusteredPoints"),
>>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>>>    } catch (InvocationTargetException e) {
>>>>        // TODO Auto-generated catch block
>>>>        e.printStackTrace();
>>>>    }
>>>>
>>>>    // run ClusterDumper
>>>>    ClusterDumper clusterDumper =
>>>>        new ClusterDumper(new Path(output, "clusters-" + maxIterations),
>>>> new
>>>> Path(output, "convertedClusteredPoints"));
>>>>    clusterDumper.printClusters(null);
>>>>  }
>>>>
>>>> On Tue, Mar 1, 2011 at 10:12 AM, Lance Norskog <go...@gmail.com>
>>>> wrote:
>>>>
>>>> > Does anybody use the Mahalanobis distance measure class? If so, what
>>>> for?
>>>> > And how do you prepare the input matrices?
>>>> >
>>>> > Lance
>>>> >
>>>>
>>>
>>>
>>
>

Re: Mahalanobis users out there?

Posted by Ted Dunning <te...@gmail.com>.
Good fellow!

I will take a quick look.

On Sun, Mar 6, 2011 at 5:15 AM, Vasil Vasilev <va...@gmail.com> wrote:

> Hi Ted,
>
> The code above is an example how to use MahalanobisDistanceMeasure. About
> the problems that I came upon I created Jira and attached a patch to it:
> https://issues.apache.org/jira/browse/MAHOUT-616
>
> Regards, Vasil
>
>
> On Tue, Mar 1, 2011 at 7:26 PM, Ted Dunning <te...@gmail.com> wrote:
>
>> Vasil,
>>
>> If you are suggesting a change in Mahout, can you to to to
>> https://issues.apache.org/jira/browse/MAHOUT
>>  <https://issues.apache.org/jira/browse/MAHOUT>and file an issue with a
>> patch?
>>
>> In case the terminology is new for you, an issue is a bug report or
>> enhancement request and a patch is
>> the output of svn diff or git format-patch.
>>
>> You can get more information about this process here:
>> https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute
>>
>>
>> On Tue, Mar 1, 2011 at 1:11 AM, Vasil Vasilev <va...@gmail.com>wrote:
>>
>>> Hi Lance,
>>>
>>> I did a small test with the Mahalanobis Distance Measure and Dirichlet
>>> clustering. Unfortunately it was not very successful at the first time,
>>> because its "configure" method was never called.
>>> I did some changes in the Mahout code to be able to run it and used the
>>> following code in the
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job class:
>>>
>>> /**
>>>   * Run the job using supplied arguments, deleting the output directory
>>> if
>>> it exists beforehand
>>>   *
>>>   * @param input
>>>   *          the directory pathname for input points
>>>   * @param output
>>>   *          the directory pathname for output points
>>>   * @param modelDistribution
>>>   *          the ModelDistribution
>>>   * @param numModels
>>>   *          the number of Models
>>>   * @param maxIterations
>>>   *          the maximum number of iterations
>>>   * @param alpha0
>>>   *          the alpha0 value for the DirichletDistribution
>>>   */
>>>  public void run(Path input,
>>>                  Path output,
>>>                  ModelDistribution<VectorWritable> modelDistribution,
>>>                  int numModels,
>>>                  int maxIterations,
>>>                  double alpha0,
>>>                  boolean emitMostLikely,
>>>                  double threshold)
>>>    throws IOException, ClassNotFoundException, InstantiationException,
>>> IllegalAccessException,
>>>           SecurityException, InterruptedException {
>>>      Configuration conf = new Configuration();
>>>
>>>      if(modelDistribution instanceof DistanceMeasureClusterDistribution)
>>>        {
>>>            DistanceMeasure measure =
>>> ((DistanceMeasureClusterDistribution)modelDistribution).getMeasure();
>>>            if(measure instanceof MahalanobisDistanceMeasure)
>>>            {
>>>                Vector meanVector = new DenseVector(new double [] {0.0,
>>> 22.0, 25.0});
>>>
>>> ((MahalanobisDistanceMeasure)measure).setMeanVector(meanVector);
>>>                Matrix m= new DenseMatrix(new double [][] {{1.0, 0.0,
>>> 0.0},
>>> {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}});
>>>
>>> ((MahalanobisDistanceMeasure)measure).setCovarianceMatrix(m);
>>>
>>>                Path inverseCovarianceFile = new
>>> Path("output/MahalanobisDistanceMeasureInverseCovarianceFile");
>>>
>>>  conf.set("MahalanobisDistanceMeasure.inverseCovarianceFile",
>>> "output/MahalanobisDistanceMeasureInverseCovarianceFile");
>>>                FileSystem fs =
>>> FileSystem.get(inverseCovarianceFile.toUri(), conf);
>>>                MatrixWritable inverseCovarianceMatrix = new
>>>
>>> MatrixWritable(((MahalanobisDistanceMeasure)measure).getInverseCovarianceMatrix());
>>>                DataOutputStream out = fs.create(inverseCovarianceFile);
>>>                try {
>>>                  inverseCovarianceMatrix.write(out);
>>>                } finally {
>>>                    out.close();
>>>                }
>>>
>>>                Path meanVectorFile = new
>>> Path("output/MahalanobisDistanceMeasureMeanVectorFile");
>>>                conf.set("MahalanobisDistanceMeasure.meanVectorFile",
>>> "output/MahalanobisDistanceMeasureMeanVectorFile");
>>>                fs = FileSystem.get(meanVectorFile.toUri(), conf);
>>>                VectorWritable meanVectorWritable = new
>>> VectorWritable(meanVector);
>>>                out = fs.create(meanVectorFile);
>>>                try {
>>>                    meanVectorWritable.write(out);
>>>                } finally {
>>>                    out.close();
>>>                }
>>>
>>>                conf.set("MahalanobisDistanceMeasure.maxtrixClass",
>>> MatrixWritable.class.getName());
>>>                conf.set("MahalanobisDistanceMeasure.vectorClass",
>>> VectorWritable.class.getName());
>>>            }
>>>        }
>>>
>>>    Path directoryContainingConvertedInput = new Path(output,
>>> DIRECTORY_CONTAINING_CONVERTED_INPUT);
>>>    SynthInputDriver.runJob(input, directoryContainingConvertedInput,
>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>>    //InputDriver.runJob(input, directoryContainingConvertedInput,
>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>>    DirichletDriver.run(conf, directoryContainingConvertedInput,
>>>                        output,
>>>                        modelDistribution,
>>>                        numModels,
>>>                        maxIterations,
>>>                        alpha0,
>>>                        true,
>>>                        emitMostLikely,
>>>                        threshold,
>>>                        true);
>>>
>>>    try {
>>>
>>>
>>> ClusteredPointsConverter.convertClusteredPoints(directoryContainingConvertedInput,
>>> new Path(output, "clusteredPoints"),  new Path(output,
>>> "convertedClusteredPoints"),
>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>>    } catch (InvocationTargetException e) {
>>>        // TODO Auto-generated catch block
>>>        e.printStackTrace();
>>>    }
>>>
>>>    // run ClusterDumper
>>>    ClusterDumper clusterDumper =
>>>        new ClusterDumper(new Path(output, "clusters-" + maxIterations),
>>> new
>>> Path(output, "convertedClusteredPoints"));
>>>    clusterDumper.printClusters(null);
>>>  }
>>>
>>> On Tue, Mar 1, 2011 at 10:12 AM, Lance Norskog <go...@gmail.com>
>>> wrote:
>>>
>>> > Does anybody use the Mahalanobis distance measure class? If so, what
>>> for?
>>> > And how do you prepare the input matrices?
>>> >
>>> > Lance
>>> >
>>>
>>
>>
>

Re: Mahalanobis users out there?

Posted by Vasil Vasilev <va...@gmail.com>.
Hi Ted,

The code above is an example how to use MahalanobisDistanceMeasure. About
the problems that I came upon I created Jira and attached a patch to it:
https://issues.apache.org/jira/browse/MAHOUT-616

Regards, Vasil

On Tue, Mar 1, 2011 at 7:26 PM, Ted Dunning <te...@gmail.com> wrote:

> Vasil,
>
> If you are suggesting a change in Mahout, can you to to to
> https://issues.apache.org/jira/browse/MAHOUT
>  <https://issues.apache.org/jira/browse/MAHOUT>and file an issue with a
> patch?
>
> In case the terminology is new for you, an issue is a bug report or
> enhancement request and a patch is
> the output of svn diff or git format-patch.
>
> You can get more information about this process here:
> https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute
>
>
> On Tue, Mar 1, 2011 at 1:11 AM, Vasil Vasilev <va...@gmail.com> wrote:
>
>> Hi Lance,
>>
>> I did a small test with the Mahalanobis Distance Measure and Dirichlet
>> clustering. Unfortunately it was not very successful at the first time,
>> because its "configure" method was never called.
>> I did some changes in the Mahout code to be able to run it and used the
>> following code in the
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job class:
>>
>> /**
>>   * Run the job using supplied arguments, deleting the output directory if
>> it exists beforehand
>>   *
>>   * @param input
>>   *          the directory pathname for input points
>>   * @param output
>>   *          the directory pathname for output points
>>   * @param modelDistribution
>>   *          the ModelDistribution
>>   * @param numModels
>>   *          the number of Models
>>   * @param maxIterations
>>   *          the maximum number of iterations
>>   * @param alpha0
>>   *          the alpha0 value for the DirichletDistribution
>>   */
>>  public void run(Path input,
>>                  Path output,
>>                  ModelDistribution<VectorWritable> modelDistribution,
>>                  int numModels,
>>                  int maxIterations,
>>                  double alpha0,
>>                  boolean emitMostLikely,
>>                  double threshold)
>>    throws IOException, ClassNotFoundException, InstantiationException,
>> IllegalAccessException,
>>           SecurityException, InterruptedException {
>>      Configuration conf = new Configuration();
>>
>>      if(modelDistribution instanceof DistanceMeasureClusterDistribution)
>>        {
>>            DistanceMeasure measure =
>> ((DistanceMeasureClusterDistribution)modelDistribution).getMeasure();
>>            if(measure instanceof MahalanobisDistanceMeasure)
>>            {
>>                Vector meanVector = new DenseVector(new double [] {0.0,
>> 22.0, 25.0});
>>
>> ((MahalanobisDistanceMeasure)measure).setMeanVector(meanVector);
>>                Matrix m= new DenseMatrix(new double [][] {{1.0, 0.0, 0.0},
>> {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}});
>>
>> ((MahalanobisDistanceMeasure)measure).setCovarianceMatrix(m);
>>
>>                Path inverseCovarianceFile = new
>> Path("output/MahalanobisDistanceMeasureInverseCovarianceFile");
>>
>>  conf.set("MahalanobisDistanceMeasure.inverseCovarianceFile",
>> "output/MahalanobisDistanceMeasureInverseCovarianceFile");
>>                FileSystem fs =
>> FileSystem.get(inverseCovarianceFile.toUri(), conf);
>>                MatrixWritable inverseCovarianceMatrix = new
>>
>> MatrixWritable(((MahalanobisDistanceMeasure)measure).getInverseCovarianceMatrix());
>>                DataOutputStream out = fs.create(inverseCovarianceFile);
>>                try {
>>                  inverseCovarianceMatrix.write(out);
>>                } finally {
>>                    out.close();
>>                }
>>
>>                Path meanVectorFile = new
>> Path("output/MahalanobisDistanceMeasureMeanVectorFile");
>>                conf.set("MahalanobisDistanceMeasure.meanVectorFile",
>> "output/MahalanobisDistanceMeasureMeanVectorFile");
>>                fs = FileSystem.get(meanVectorFile.toUri(), conf);
>>                VectorWritable meanVectorWritable = new
>> VectorWritable(meanVector);
>>                out = fs.create(meanVectorFile);
>>                try {
>>                    meanVectorWritable.write(out);
>>                } finally {
>>                    out.close();
>>                }
>>
>>                conf.set("MahalanobisDistanceMeasure.maxtrixClass",
>> MatrixWritable.class.getName());
>>                conf.set("MahalanobisDistanceMeasure.vectorClass",
>> VectorWritable.class.getName());
>>            }
>>        }
>>
>>    Path directoryContainingConvertedInput = new Path(output,
>> DIRECTORY_CONTAINING_CONVERTED_INPUT);
>>    SynthInputDriver.runJob(input, directoryContainingConvertedInput,
>> "org.apache.mahout.math.RandomAccessSparseVector");
>>    //InputDriver.runJob(input, directoryContainingConvertedInput,
>> "org.apache.mahout.math.RandomAccessSparseVector");
>>    DirichletDriver.run(conf, directoryContainingConvertedInput,
>>                        output,
>>                        modelDistribution,
>>                        numModels,
>>                        maxIterations,
>>                        alpha0,
>>                        true,
>>                        emitMostLikely,
>>                        threshold,
>>                        true);
>>
>>    try {
>>
>>
>> ClusteredPointsConverter.convertClusteredPoints(directoryContainingConvertedInput,
>> new Path(output, "clusteredPoints"),  new Path(output,
>> "convertedClusteredPoints"),
>> "org.apache.mahout.math.RandomAccessSparseVector");
>>    } catch (InvocationTargetException e) {
>>        // TODO Auto-generated catch block
>>        e.printStackTrace();
>>    }
>>
>>    // run ClusterDumper
>>    ClusterDumper clusterDumper =
>>        new ClusterDumper(new Path(output, "clusters-" + maxIterations),
>> new
>> Path(output, "convertedClusteredPoints"));
>>    clusterDumper.printClusters(null);
>>  }
>>
>> On Tue, Mar 1, 2011 at 10:12 AM, Lance Norskog <go...@gmail.com> wrote:
>>
>> > Does anybody use the Mahalanobis distance measure class? If so, what
>> for?
>> > And how do you prepare the input matrices?
>> >
>> > Lance
>> >
>>
>
>

Re: Mahalanobis users out there?

Posted by Ted Dunning <te...@gmail.com>.
Vasil,

If you are suggesting a change in Mahout, can you to to to
https://issues.apache.org/jira/browse/MAHOUT
 <https://issues.apache.org/jira/browse/MAHOUT>and file an issue with a
patch?

In case the terminology is new for you, an issue is a bug report or
enhancement request and a patch is
the output of svn diff or git format-patch.

You can get more information about this process here:
https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute

On Tue, Mar 1, 2011 at 1:11 AM, Vasil Vasilev <va...@gmail.com> wrote:

> Hi Lance,
>
> I did a small test with the Mahalanobis Distance Measure and Dirichlet
> clustering. Unfortunately it was not very successful at the first time,
> because its "configure" method was never called.
> I did some changes in the Mahout code to be able to run it and used the
> following code in the
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job class:
>
> /**
>   * Run the job using supplied arguments, deleting the output directory if
> it exists beforehand
>   *
>   * @param input
>   *          the directory pathname for input points
>   * @param output
>   *          the directory pathname for output points
>   * @param modelDistribution
>   *          the ModelDistribution
>   * @param numModels
>   *          the number of Models
>   * @param maxIterations
>   *          the maximum number of iterations
>   * @param alpha0
>   *          the alpha0 value for the DirichletDistribution
>   */
>  public void run(Path input,
>                  Path output,
>                  ModelDistribution<VectorWritable> modelDistribution,
>                  int numModels,
>                  int maxIterations,
>                  double alpha0,
>                  boolean emitMostLikely,
>                  double threshold)
>    throws IOException, ClassNotFoundException, InstantiationException,
> IllegalAccessException,
>           SecurityException, InterruptedException {
>      Configuration conf = new Configuration();
>
>      if(modelDistribution instanceof DistanceMeasureClusterDistribution)
>        {
>            DistanceMeasure measure =
> ((DistanceMeasureClusterDistribution)modelDistribution).getMeasure();
>            if(measure instanceof MahalanobisDistanceMeasure)
>            {
>                Vector meanVector = new DenseVector(new double [] {0.0,
> 22.0, 25.0});
>
> ((MahalanobisDistanceMeasure)measure).setMeanVector(meanVector);
>                Matrix m= new DenseMatrix(new double [][] {{1.0, 0.0, 0.0},
> {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}});
>
> ((MahalanobisDistanceMeasure)measure).setCovarianceMatrix(m);
>
>                Path inverseCovarianceFile = new
> Path("output/MahalanobisDistanceMeasureInverseCovarianceFile");
>                conf.set("MahalanobisDistanceMeasure.inverseCovarianceFile",
> "output/MahalanobisDistanceMeasureInverseCovarianceFile");
>                FileSystem fs =
> FileSystem.get(inverseCovarianceFile.toUri(), conf);
>                MatrixWritable inverseCovarianceMatrix = new
>
> MatrixWritable(((MahalanobisDistanceMeasure)measure).getInverseCovarianceMatrix());
>                DataOutputStream out = fs.create(inverseCovarianceFile);
>                try {
>                  inverseCovarianceMatrix.write(out);
>                } finally {
>                    out.close();
>                }
>
>                Path meanVectorFile = new
> Path("output/MahalanobisDistanceMeasureMeanVectorFile");
>                conf.set("MahalanobisDistanceMeasure.meanVectorFile",
> "output/MahalanobisDistanceMeasureMeanVectorFile");
>                fs = FileSystem.get(meanVectorFile.toUri(), conf);
>                VectorWritable meanVectorWritable = new
> VectorWritable(meanVector);
>                out = fs.create(meanVectorFile);
>                try {
>                    meanVectorWritable.write(out);
>                } finally {
>                    out.close();
>                }
>
>                conf.set("MahalanobisDistanceMeasure.maxtrixClass",
> MatrixWritable.class.getName());
>                conf.set("MahalanobisDistanceMeasure.vectorClass",
> VectorWritable.class.getName());
>            }
>        }
>
>    Path directoryContainingConvertedInput = new Path(output,
> DIRECTORY_CONTAINING_CONVERTED_INPUT);
>    SynthInputDriver.runJob(input, directoryContainingConvertedInput,
> "org.apache.mahout.math.RandomAccessSparseVector");
>    //InputDriver.runJob(input, directoryContainingConvertedInput,
> "org.apache.mahout.math.RandomAccessSparseVector");
>    DirichletDriver.run(conf, directoryContainingConvertedInput,
>                        output,
>                        modelDistribution,
>                        numModels,
>                        maxIterations,
>                        alpha0,
>                        true,
>                        emitMostLikely,
>                        threshold,
>                        true);
>
>    try {
>
>
> ClusteredPointsConverter.convertClusteredPoints(directoryContainingConvertedInput,
> new Path(output, "clusteredPoints"),  new Path(output,
> "convertedClusteredPoints"),
> "org.apache.mahout.math.RandomAccessSparseVector");
>    } catch (InvocationTargetException e) {
>        // TODO Auto-generated catch block
>        e.printStackTrace();
>    }
>
>    // run ClusterDumper
>    ClusterDumper clusterDumper =
>        new ClusterDumper(new Path(output, "clusters-" + maxIterations), new
> Path(output, "convertedClusteredPoints"));
>    clusterDumper.printClusters(null);
>  }
>
> On Tue, Mar 1, 2011 at 10:12 AM, Lance Norskog <go...@gmail.com> wrote:
>
> > Does anybody use the Mahalanobis distance measure class? If so, what for?
> > And how do you prepare the input matrices?
> >
> > Lance
> >
>

Re: Mahalanobis users out there?

Posted by Vasil Vasilev <va...@gmail.com>.
Hi Lance,

I did a small test with the Mahalanobis Distance Measure and Dirichlet
clustering. Unfortunately it was not very successful at the first time,
because its "configure" method was never called.
I did some changes in the Mahout code to be able to run it and used the
following code in the
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job class:

/**
   * Run the job using supplied arguments, deleting the output directory if
it exists beforehand
   *
   * @param input
   *          the directory pathname for input points
   * @param output
   *          the directory pathname for output points
   * @param modelDistribution
   *          the ModelDistribution
   * @param numModels
   *          the number of Models
   * @param maxIterations
   *          the maximum number of iterations
   * @param alpha0
   *          the alpha0 value for the DirichletDistribution
   */
  public void run(Path input,
                  Path output,
                  ModelDistribution<VectorWritable> modelDistribution,
                  int numModels,
                  int maxIterations,
                  double alpha0,
                  boolean emitMostLikely,
                  double threshold)
    throws IOException, ClassNotFoundException, InstantiationException,
IllegalAccessException,
           SecurityException, InterruptedException {
      Configuration conf = new Configuration();

      if(modelDistribution instanceof DistanceMeasureClusterDistribution)
        {
            DistanceMeasure measure =
((DistanceMeasureClusterDistribution)modelDistribution).getMeasure();
            if(measure instanceof MahalanobisDistanceMeasure)
            {
                Vector meanVector = new DenseVector(new double [] {0.0,
22.0, 25.0});

((MahalanobisDistanceMeasure)measure).setMeanVector(meanVector);
                Matrix m= new DenseMatrix(new double [][] {{1.0, 0.0, 0.0},
{0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}});

((MahalanobisDistanceMeasure)measure).setCovarianceMatrix(m);

                Path inverseCovarianceFile = new
Path("output/MahalanobisDistanceMeasureInverseCovarianceFile");
                conf.set("MahalanobisDistanceMeasure.inverseCovarianceFile",
"output/MahalanobisDistanceMeasureInverseCovarianceFile");
                FileSystem fs =
FileSystem.get(inverseCovarianceFile.toUri(), conf);
                MatrixWritable inverseCovarianceMatrix = new
MatrixWritable(((MahalanobisDistanceMeasure)measure).getInverseCovarianceMatrix());
                DataOutputStream out = fs.create(inverseCovarianceFile);
                try {
                  inverseCovarianceMatrix.write(out);
                } finally {
                    out.close();
                }

                Path meanVectorFile = new
Path("output/MahalanobisDistanceMeasureMeanVectorFile");
                conf.set("MahalanobisDistanceMeasure.meanVectorFile",
"output/MahalanobisDistanceMeasureMeanVectorFile");
                fs = FileSystem.get(meanVectorFile.toUri(), conf);
                VectorWritable meanVectorWritable = new
VectorWritable(meanVector);
                out = fs.create(meanVectorFile);
                try {
                    meanVectorWritable.write(out);
                } finally {
                    out.close();
                }

                conf.set("MahalanobisDistanceMeasure.maxtrixClass",
MatrixWritable.class.getName());
                conf.set("MahalanobisDistanceMeasure.vectorClass",
VectorWritable.class.getName());
            }
        }

    Path directoryContainingConvertedInput = new Path(output,
DIRECTORY_CONTAINING_CONVERTED_INPUT);
    SynthInputDriver.runJob(input, directoryContainingConvertedInput,
"org.apache.mahout.math.RandomAccessSparseVector");
    //InputDriver.runJob(input, directoryContainingConvertedInput,
"org.apache.mahout.math.RandomAccessSparseVector");
    DirichletDriver.run(conf, directoryContainingConvertedInput,
                        output,
                        modelDistribution,
                        numModels,
                        maxIterations,
                        alpha0,
                        true,
                        emitMostLikely,
                        threshold,
                        true);

    try {

ClusteredPointsConverter.convertClusteredPoints(directoryContainingConvertedInput,
new Path(output, "clusteredPoints"),  new Path(output,
"convertedClusteredPoints"),
"org.apache.mahout.math.RandomAccessSparseVector");
    } catch (InvocationTargetException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    // run ClusterDumper
    ClusterDumper clusterDumper =
        new ClusterDumper(new Path(output, "clusters-" + maxIterations), new
Path(output, "convertedClusteredPoints"));
    clusterDumper.printClusters(null);
  }

On Tue, Mar 1, 2011 at 10:12 AM, Lance Norskog <go...@gmail.com> wrote:

> Does anybody use the Mahalanobis distance measure class? If so, what for?
> And how do you prepare the input matrices?
>
> Lance
>