You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2011/03/01 09:12:07 UTC
Mahalanobis users out there?
Does anybody use the Mahalanobis distance measure class? If so, what
for? And how do you prepare the input matrices?
Lance
Re: Mahalanobis users out there?
Posted by Vasil Vasilev <va...@gmail.com>.
Hi,
I provided a newer version of the fix + 2 tests for verification. Can
someone take a look?
Regards, Vasil
On Sun, Mar 6, 2011 at 7:44 PM, Ted Dunning <te...@gmail.com> wrote:
> Good fellow!
>
> I will take a quick look.
>
>
> On Sun, Mar 6, 2011 at 5:15 AM, Vasil Vasilev <va...@gmail.com> wrote:
>
>> Hi Ted,
>>
>> The code above is an example how to use MahalanobisDistanceMeasure. About
>> the problems that I came upon I created Jira and attached a patch to it:
>> https://issues.apache.org/jira/browse/MAHOUT-616
>>
>> Regards, Vasil
>>
>>
>> On Tue, Mar 1, 2011 at 7:26 PM, Ted Dunning <te...@gmail.com>wrote:
>>
>>> Vasil,
>>>
>>> If you are suggesting a change in Mahout, can you to to to
>>> https://issues.apache.org/jira/browse/MAHOUT
>>> <https://issues.apache.org/jira/browse/MAHOUT>and file an issue with a
>>> patch?
>>>
>>> In case the terminology is new for you, an issue is a bug report or
>>> enhancement request and a patch is
>>> the output of svn diff or git format-patch.
>>>
>>> You can get more information about this process here:
>>> https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute
>>>
>>>
>>> On Tue, Mar 1, 2011 at 1:11 AM, Vasil Vasilev <va...@gmail.com>wrote:
>>>
>>>> Hi Lance,
>>>>
>>>> I did a small test with the Mahalanobis Distance Measure and Dirichlet
>>>> clustering. Unfortunately it was not very successful at the first time,
>>>> because its "configure" method was never called.
>>>> I did some changes in the Mahout code to be able to run it and used the
>>>> following code in the
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job class:
>>>>
>>>> /**
>>>> * Run the job using supplied arguments, deleting the output directory
>>>> if
>>>> it exists beforehand
>>>> *
>>>> * @param input
>>>> * the directory pathname for input points
>>>> * @param output
>>>> * the directory pathname for output points
>>>> * @param modelDistribution
>>>> * the ModelDistribution
>>>> * @param numModels
>>>> * the number of Models
>>>> * @param maxIterations
>>>> * the maximum number of iterations
>>>> * @param alpha0
>>>> * the alpha0 value for the DirichletDistribution
>>>> */
>>>> public void run(Path input,
>>>> Path output,
>>>> ModelDistribution<VectorWritable> modelDistribution,
>>>> int numModels,
>>>> int maxIterations,
>>>> double alpha0,
>>>> boolean emitMostLikely,
>>>> double threshold)
>>>> throws IOException, ClassNotFoundException, InstantiationException,
>>>> IllegalAccessException,
>>>> SecurityException, InterruptedException {
>>>> Configuration conf = new Configuration();
>>>>
>>>> if(modelDistribution instanceof DistanceMeasureClusterDistribution)
>>>> {
>>>> DistanceMeasure measure =
>>>> ((DistanceMeasureClusterDistribution)modelDistribution).getMeasure();
>>>> if(measure instanceof MahalanobisDistanceMeasure)
>>>> {
>>>> Vector meanVector = new DenseVector(new double [] {0.0,
>>>> 22.0, 25.0});
>>>>
>>>> ((MahalanobisDistanceMeasure)measure).setMeanVector(meanVector);
>>>> Matrix m= new DenseMatrix(new double [][] {{1.0, 0.0,
>>>> 0.0},
>>>> {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}});
>>>>
>>>> ((MahalanobisDistanceMeasure)measure).setCovarianceMatrix(m);
>>>>
>>>> Path inverseCovarianceFile = new
>>>> Path("output/MahalanobisDistanceMeasureInverseCovarianceFile");
>>>>
>>>> conf.set("MahalanobisDistanceMeasure.inverseCovarianceFile",
>>>> "output/MahalanobisDistanceMeasureInverseCovarianceFile");
>>>> FileSystem fs =
>>>> FileSystem.get(inverseCovarianceFile.toUri(), conf);
>>>> MatrixWritable inverseCovarianceMatrix = new
>>>>
>>>> MatrixWritable(((MahalanobisDistanceMeasure)measure).getInverseCovarianceMatrix());
>>>> DataOutputStream out = fs.create(inverseCovarianceFile);
>>>> try {
>>>> inverseCovarianceMatrix.write(out);
>>>> } finally {
>>>> out.close();
>>>> }
>>>>
>>>> Path meanVectorFile = new
>>>> Path("output/MahalanobisDistanceMeasureMeanVectorFile");
>>>> conf.set("MahalanobisDistanceMeasure.meanVectorFile",
>>>> "output/MahalanobisDistanceMeasureMeanVectorFile");
>>>> fs = FileSystem.get(meanVectorFile.toUri(), conf);
>>>> VectorWritable meanVectorWritable = new
>>>> VectorWritable(meanVector);
>>>> out = fs.create(meanVectorFile);
>>>> try {
>>>> meanVectorWritable.write(out);
>>>> } finally {
>>>> out.close();
>>>> }
>>>>
>>>> conf.set("MahalanobisDistanceMeasure.maxtrixClass",
>>>> MatrixWritable.class.getName());
>>>> conf.set("MahalanobisDistanceMeasure.vectorClass",
>>>> VectorWritable.class.getName());
>>>> }
>>>> }
>>>>
>>>> Path directoryContainingConvertedInput = new Path(output,
>>>> DIRECTORY_CONTAINING_CONVERTED_INPUT);
>>>> SynthInputDriver.runJob(input, directoryContainingConvertedInput,
>>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>>> //InputDriver.runJob(input, directoryContainingConvertedInput,
>>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>>> DirichletDriver.run(conf, directoryContainingConvertedInput,
>>>> output,
>>>> modelDistribution,
>>>> numModels,
>>>> maxIterations,
>>>> alpha0,
>>>> true,
>>>> emitMostLikely,
>>>> threshold,
>>>> true);
>>>>
>>>> try {
>>>>
>>>>
>>>> ClusteredPointsConverter.convertClusteredPoints(directoryContainingConvertedInput,
>>>> new Path(output, "clusteredPoints"), new Path(output,
>>>> "convertedClusteredPoints"),
>>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>>> } catch (InvocationTargetException e) {
>>>> // TODO Auto-generated catch block
>>>> e.printStackTrace();
>>>> }
>>>>
>>>> // run ClusterDumper
>>>> ClusterDumper clusterDumper =
>>>> new ClusterDumper(new Path(output, "clusters-" + maxIterations),
>>>> new
>>>> Path(output, "convertedClusteredPoints"));
>>>> clusterDumper.printClusters(null);
>>>> }
>>>>
>>>> On Tue, Mar 1, 2011 at 10:12 AM, Lance Norskog <go...@gmail.com>
>>>> wrote:
>>>>
>>>> > Does anybody use the Mahalanobis distance measure class? If so, what
>>>> for?
>>>> > And how do you prepare the input matrices?
>>>> >
>>>> > Lance
>>>> >
>>>>
>>>
>>>
>>
>
Re: Mahalanobis users out there?
Posted by Ted Dunning <te...@gmail.com>.
Good fellow!
I will take a quick look.
On Sun, Mar 6, 2011 at 5:15 AM, Vasil Vasilev <va...@gmail.com> wrote:
> Hi Ted,
>
> The code above is an example how to use MahalanobisDistanceMeasure. About
> the problems that I came upon I created Jira and attached a patch to it:
> https://issues.apache.org/jira/browse/MAHOUT-616
>
> Regards, Vasil
>
>
> On Tue, Mar 1, 2011 at 7:26 PM, Ted Dunning <te...@gmail.com> wrote:
>
>> Vasil,
>>
>> If you are suggesting a change in Mahout, can you to to to
>> https://issues.apache.org/jira/browse/MAHOUT
>> <https://issues.apache.org/jira/browse/MAHOUT>and file an issue with a
>> patch?
>>
>> In case the terminology is new for you, an issue is a bug report or
>> enhancement request and a patch is
>> the output of svn diff or git format-patch.
>>
>> You can get more information about this process here:
>> https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute
>>
>>
>> On Tue, Mar 1, 2011 at 1:11 AM, Vasil Vasilev <va...@gmail.com>wrote:
>>
>>> Hi Lance,
>>>
>>> I did a small test with the Mahalanobis Distance Measure and Dirichlet
>>> clustering. Unfortunately it was not very successful at the first time,
>>> because its "configure" method was never called.
>>> I did some changes in the Mahout code to be able to run it and used the
>>> following code in the
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job class:
>>>
>>> /**
>>> * Run the job using supplied arguments, deleting the output directory
>>> if
>>> it exists beforehand
>>> *
>>> * @param input
>>> * the directory pathname for input points
>>> * @param output
>>> * the directory pathname for output points
>>> * @param modelDistribution
>>> * the ModelDistribution
>>> * @param numModels
>>> * the number of Models
>>> * @param maxIterations
>>> * the maximum number of iterations
>>> * @param alpha0
>>> * the alpha0 value for the DirichletDistribution
>>> */
>>> public void run(Path input,
>>> Path output,
>>> ModelDistribution<VectorWritable> modelDistribution,
>>> int numModels,
>>> int maxIterations,
>>> double alpha0,
>>> boolean emitMostLikely,
>>> double threshold)
>>> throws IOException, ClassNotFoundException, InstantiationException,
>>> IllegalAccessException,
>>> SecurityException, InterruptedException {
>>> Configuration conf = new Configuration();
>>>
>>> if(modelDistribution instanceof DistanceMeasureClusterDistribution)
>>> {
>>> DistanceMeasure measure =
>>> ((DistanceMeasureClusterDistribution)modelDistribution).getMeasure();
>>> if(measure instanceof MahalanobisDistanceMeasure)
>>> {
>>> Vector meanVector = new DenseVector(new double [] {0.0,
>>> 22.0, 25.0});
>>>
>>> ((MahalanobisDistanceMeasure)measure).setMeanVector(meanVector);
>>> Matrix m= new DenseMatrix(new double [][] {{1.0, 0.0,
>>> 0.0},
>>> {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}});
>>>
>>> ((MahalanobisDistanceMeasure)measure).setCovarianceMatrix(m);
>>>
>>> Path inverseCovarianceFile = new
>>> Path("output/MahalanobisDistanceMeasureInverseCovarianceFile");
>>>
>>> conf.set("MahalanobisDistanceMeasure.inverseCovarianceFile",
>>> "output/MahalanobisDistanceMeasureInverseCovarianceFile");
>>> FileSystem fs =
>>> FileSystem.get(inverseCovarianceFile.toUri(), conf);
>>> MatrixWritable inverseCovarianceMatrix = new
>>>
>>> MatrixWritable(((MahalanobisDistanceMeasure)measure).getInverseCovarianceMatrix());
>>> DataOutputStream out = fs.create(inverseCovarianceFile);
>>> try {
>>> inverseCovarianceMatrix.write(out);
>>> } finally {
>>> out.close();
>>> }
>>>
>>> Path meanVectorFile = new
>>> Path("output/MahalanobisDistanceMeasureMeanVectorFile");
>>> conf.set("MahalanobisDistanceMeasure.meanVectorFile",
>>> "output/MahalanobisDistanceMeasureMeanVectorFile");
>>> fs = FileSystem.get(meanVectorFile.toUri(), conf);
>>> VectorWritable meanVectorWritable = new
>>> VectorWritable(meanVector);
>>> out = fs.create(meanVectorFile);
>>> try {
>>> meanVectorWritable.write(out);
>>> } finally {
>>> out.close();
>>> }
>>>
>>> conf.set("MahalanobisDistanceMeasure.maxtrixClass",
>>> MatrixWritable.class.getName());
>>> conf.set("MahalanobisDistanceMeasure.vectorClass",
>>> VectorWritable.class.getName());
>>> }
>>> }
>>>
>>> Path directoryContainingConvertedInput = new Path(output,
>>> DIRECTORY_CONTAINING_CONVERTED_INPUT);
>>> SynthInputDriver.runJob(input, directoryContainingConvertedInput,
>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>> //InputDriver.runJob(input, directoryContainingConvertedInput,
>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>> DirichletDriver.run(conf, directoryContainingConvertedInput,
>>> output,
>>> modelDistribution,
>>> numModels,
>>> maxIterations,
>>> alpha0,
>>> true,
>>> emitMostLikely,
>>> threshold,
>>> true);
>>>
>>> try {
>>>
>>>
>>> ClusteredPointsConverter.convertClusteredPoints(directoryContainingConvertedInput,
>>> new Path(output, "clusteredPoints"), new Path(output,
>>> "convertedClusteredPoints"),
>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>> } catch (InvocationTargetException e) {
>>> // TODO Auto-generated catch block
>>> e.printStackTrace();
>>> }
>>>
>>> // run ClusterDumper
>>> ClusterDumper clusterDumper =
>>> new ClusterDumper(new Path(output, "clusters-" + maxIterations),
>>> new
>>> Path(output, "convertedClusteredPoints"));
>>> clusterDumper.printClusters(null);
>>> }
>>>
>>> On Tue, Mar 1, 2011 at 10:12 AM, Lance Norskog <go...@gmail.com>
>>> wrote:
>>>
>>> > Does anybody use the Mahalanobis distance measure class? If so, what
>>> for?
>>> > And how do you prepare the input matrices?
>>> >
>>> > Lance
>>> >
>>>
>>
>>
>
Re: Mahalanobis users out there?
Posted by Vasil Vasilev <va...@gmail.com>.
Hi Ted,
The code above is an example how to use MahalanobisDistanceMeasure. About
the problems that I came upon I created Jira and attached a patch to it:
https://issues.apache.org/jira/browse/MAHOUT-616
Regards, Vasil
On Tue, Mar 1, 2011 at 7:26 PM, Ted Dunning <te...@gmail.com> wrote:
> Vasil,
>
> If you are suggesting a change in Mahout, can you to to to
> https://issues.apache.org/jira/browse/MAHOUT
> <https://issues.apache.org/jira/browse/MAHOUT>and file an issue with a
> patch?
>
> In case the terminology is new for you, an issue is a bug report or
> enhancement request and a patch is
> the output of svn diff or git format-patch.
>
> You can get more information about this process here:
> https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute
>
>
> On Tue, Mar 1, 2011 at 1:11 AM, Vasil Vasilev <va...@gmail.com> wrote:
>
>> Hi Lance,
>>
>> I did a small test with the Mahalanobis Distance Measure and Dirichlet
>> clustering. Unfortunately it was not very successful at the first time,
>> because its "configure" method was never called.
>> I did some changes in the Mahout code to be able to run it and used the
>> following code in the
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job class:
>>
>> /**
>> * Run the job using supplied arguments, deleting the output directory if
>> it exists beforehand
>> *
>> * @param input
>> * the directory pathname for input points
>> * @param output
>> * the directory pathname for output points
>> * @param modelDistribution
>> * the ModelDistribution
>> * @param numModels
>> * the number of Models
>> * @param maxIterations
>> * the maximum number of iterations
>> * @param alpha0
>> * the alpha0 value for the DirichletDistribution
>> */
>> public void run(Path input,
>> Path output,
>> ModelDistribution<VectorWritable> modelDistribution,
>> int numModels,
>> int maxIterations,
>> double alpha0,
>> boolean emitMostLikely,
>> double threshold)
>> throws IOException, ClassNotFoundException, InstantiationException,
>> IllegalAccessException,
>> SecurityException, InterruptedException {
>> Configuration conf = new Configuration();
>>
>> if(modelDistribution instanceof DistanceMeasureClusterDistribution)
>> {
>> DistanceMeasure measure =
>> ((DistanceMeasureClusterDistribution)modelDistribution).getMeasure();
>> if(measure instanceof MahalanobisDistanceMeasure)
>> {
>> Vector meanVector = new DenseVector(new double [] {0.0,
>> 22.0, 25.0});
>>
>> ((MahalanobisDistanceMeasure)measure).setMeanVector(meanVector);
>> Matrix m= new DenseMatrix(new double [][] {{1.0, 0.0, 0.0},
>> {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}});
>>
>> ((MahalanobisDistanceMeasure)measure).setCovarianceMatrix(m);
>>
>> Path inverseCovarianceFile = new
>> Path("output/MahalanobisDistanceMeasureInverseCovarianceFile");
>>
>> conf.set("MahalanobisDistanceMeasure.inverseCovarianceFile",
>> "output/MahalanobisDistanceMeasureInverseCovarianceFile");
>> FileSystem fs =
>> FileSystem.get(inverseCovarianceFile.toUri(), conf);
>> MatrixWritable inverseCovarianceMatrix = new
>>
>> MatrixWritable(((MahalanobisDistanceMeasure)measure).getInverseCovarianceMatrix());
>> DataOutputStream out = fs.create(inverseCovarianceFile);
>> try {
>> inverseCovarianceMatrix.write(out);
>> } finally {
>> out.close();
>> }
>>
>> Path meanVectorFile = new
>> Path("output/MahalanobisDistanceMeasureMeanVectorFile");
>> conf.set("MahalanobisDistanceMeasure.meanVectorFile",
>> "output/MahalanobisDistanceMeasureMeanVectorFile");
>> fs = FileSystem.get(meanVectorFile.toUri(), conf);
>> VectorWritable meanVectorWritable = new
>> VectorWritable(meanVector);
>> out = fs.create(meanVectorFile);
>> try {
>> meanVectorWritable.write(out);
>> } finally {
>> out.close();
>> }
>>
>> conf.set("MahalanobisDistanceMeasure.maxtrixClass",
>> MatrixWritable.class.getName());
>> conf.set("MahalanobisDistanceMeasure.vectorClass",
>> VectorWritable.class.getName());
>> }
>> }
>>
>> Path directoryContainingConvertedInput = new Path(output,
>> DIRECTORY_CONTAINING_CONVERTED_INPUT);
>> SynthInputDriver.runJob(input, directoryContainingConvertedInput,
>> "org.apache.mahout.math.RandomAccessSparseVector");
>> //InputDriver.runJob(input, directoryContainingConvertedInput,
>> "org.apache.mahout.math.RandomAccessSparseVector");
>> DirichletDriver.run(conf, directoryContainingConvertedInput,
>> output,
>> modelDistribution,
>> numModels,
>> maxIterations,
>> alpha0,
>> true,
>> emitMostLikely,
>> threshold,
>> true);
>>
>> try {
>>
>>
>> ClusteredPointsConverter.convertClusteredPoints(directoryContainingConvertedInput,
>> new Path(output, "clusteredPoints"), new Path(output,
>> "convertedClusteredPoints"),
>> "org.apache.mahout.math.RandomAccessSparseVector");
>> } catch (InvocationTargetException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> }
>>
>> // run ClusterDumper
>> ClusterDumper clusterDumper =
>> new ClusterDumper(new Path(output, "clusters-" + maxIterations),
>> new
>> Path(output, "convertedClusteredPoints"));
>> clusterDumper.printClusters(null);
>> }
>>
>> On Tue, Mar 1, 2011 at 10:12 AM, Lance Norskog <go...@gmail.com> wrote:
>>
>> > Does anybody use the Mahalanobis distance measure class? If so, what
>> for?
>> > And how do you prepare the input matrices?
>> >
>> > Lance
>> >
>>
>
>
Re: Mahalanobis users out there?
Posted by Ted Dunning <te...@gmail.com>.
Vasil,
If you are suggesting a change in Mahout, can you to to to
https://issues.apache.org/jira/browse/MAHOUT
<https://issues.apache.org/jira/browse/MAHOUT>and file an issue with a
patch?
In case the terminology is new for you, an issue is a bug report or
enhancement request and a patch is
the output of svn diff or git format-patch.
You can get more information about this process here:
https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute
On Tue, Mar 1, 2011 at 1:11 AM, Vasil Vasilev <va...@gmail.com> wrote:
> Hi Lance,
>
> I did a small test with the Mahalanobis Distance Measure and Dirichlet
> clustering. Unfortunately it was not very successful at the first time,
> because its "configure" method was never called.
> I did some changes in the Mahout code to be able to run it and used the
> following code in the
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job class:
>
> /**
> * Run the job using supplied arguments, deleting the output directory if
> it exists beforehand
> *
> * @param input
> * the directory pathname for input points
> * @param output
> * the directory pathname for output points
> * @param modelDistribution
> * the ModelDistribution
> * @param numModels
> * the number of Models
> * @param maxIterations
> * the maximum number of iterations
> * @param alpha0
> * the alpha0 value for the DirichletDistribution
> */
> public void run(Path input,
> Path output,
> ModelDistribution<VectorWritable> modelDistribution,
> int numModels,
> int maxIterations,
> double alpha0,
> boolean emitMostLikely,
> double threshold)
> throws IOException, ClassNotFoundException, InstantiationException,
> IllegalAccessException,
> SecurityException, InterruptedException {
> Configuration conf = new Configuration();
>
> if(modelDistribution instanceof DistanceMeasureClusterDistribution)
> {
> DistanceMeasure measure =
> ((DistanceMeasureClusterDistribution)modelDistribution).getMeasure();
> if(measure instanceof MahalanobisDistanceMeasure)
> {
> Vector meanVector = new DenseVector(new double [] {0.0,
> 22.0, 25.0});
>
> ((MahalanobisDistanceMeasure)measure).setMeanVector(meanVector);
> Matrix m= new DenseMatrix(new double [][] {{1.0, 0.0, 0.0},
> {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}});
>
> ((MahalanobisDistanceMeasure)measure).setCovarianceMatrix(m);
>
> Path inverseCovarianceFile = new
> Path("output/MahalanobisDistanceMeasureInverseCovarianceFile");
> conf.set("MahalanobisDistanceMeasure.inverseCovarianceFile",
> "output/MahalanobisDistanceMeasureInverseCovarianceFile");
> FileSystem fs =
> FileSystem.get(inverseCovarianceFile.toUri(), conf);
> MatrixWritable inverseCovarianceMatrix = new
>
> MatrixWritable(((MahalanobisDistanceMeasure)measure).getInverseCovarianceMatrix());
> DataOutputStream out = fs.create(inverseCovarianceFile);
> try {
> inverseCovarianceMatrix.write(out);
> } finally {
> out.close();
> }
>
> Path meanVectorFile = new
> Path("output/MahalanobisDistanceMeasureMeanVectorFile");
> conf.set("MahalanobisDistanceMeasure.meanVectorFile",
> "output/MahalanobisDistanceMeasureMeanVectorFile");
> fs = FileSystem.get(meanVectorFile.toUri(), conf);
> VectorWritable meanVectorWritable = new
> VectorWritable(meanVector);
> out = fs.create(meanVectorFile);
> try {
> meanVectorWritable.write(out);
> } finally {
> out.close();
> }
>
> conf.set("MahalanobisDistanceMeasure.maxtrixClass",
> MatrixWritable.class.getName());
> conf.set("MahalanobisDistanceMeasure.vectorClass",
> VectorWritable.class.getName());
> }
> }
>
> Path directoryContainingConvertedInput = new Path(output,
> DIRECTORY_CONTAINING_CONVERTED_INPUT);
> SynthInputDriver.runJob(input, directoryContainingConvertedInput,
> "org.apache.mahout.math.RandomAccessSparseVector");
> //InputDriver.runJob(input, directoryContainingConvertedInput,
> "org.apache.mahout.math.RandomAccessSparseVector");
> DirichletDriver.run(conf, directoryContainingConvertedInput,
> output,
> modelDistribution,
> numModels,
> maxIterations,
> alpha0,
> true,
> emitMostLikely,
> threshold,
> true);
>
> try {
>
>
> ClusteredPointsConverter.convertClusteredPoints(directoryContainingConvertedInput,
> new Path(output, "clusteredPoints"), new Path(output,
> "convertedClusteredPoints"),
> "org.apache.mahout.math.RandomAccessSparseVector");
> } catch (InvocationTargetException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
>
> // run ClusterDumper
> ClusterDumper clusterDumper =
> new ClusterDumper(new Path(output, "clusters-" + maxIterations), new
> Path(output, "convertedClusteredPoints"));
> clusterDumper.printClusters(null);
> }
>
> On Tue, Mar 1, 2011 at 10:12 AM, Lance Norskog <go...@gmail.com> wrote:
>
> > Does anybody use the Mahalanobis distance measure class? If so, what for?
> > And how do you prepare the input matrices?
> >
> > Lance
> >
>
Re: Mahalanobis users out there?
Posted by Vasil Vasilev <va...@gmail.com>.
Hi Lance,
I did a small test with the Mahalanobis Distance Measure and Dirichlet
clustering. Unfortunately it was not very successful at the first time,
because its "configure" method was never called.
I did some changes in the Mahout code to be able to run it and used the
following code in the
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job class:
/**
* Run the job using supplied arguments, deleting the output directory if
it exists beforehand
*
* @param input
* the directory pathname for input points
* @param output
* the directory pathname for output points
* @param modelDistribution
* the ModelDistribution
* @param numModels
* the number of Models
* @param maxIterations
* the maximum number of iterations
* @param alpha0
* the alpha0 value for the DirichletDistribution
*/
public void run(Path input,
Path output,
ModelDistribution<VectorWritable> modelDistribution,
int numModels,
int maxIterations,
double alpha0,
boolean emitMostLikely,
double threshold)
throws IOException, ClassNotFoundException, InstantiationException,
IllegalAccessException,
SecurityException, InterruptedException {
Configuration conf = new Configuration();
if(modelDistribution instanceof DistanceMeasureClusterDistribution)
{
DistanceMeasure measure =
((DistanceMeasureClusterDistribution)modelDistribution).getMeasure();
if(measure instanceof MahalanobisDistanceMeasure)
{
Vector meanVector = new DenseVector(new double [] {0.0,
22.0, 25.0});
((MahalanobisDistanceMeasure)measure).setMeanVector(meanVector);
Matrix m= new DenseMatrix(new double [][] {{1.0, 0.0, 0.0},
{0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}});
((MahalanobisDistanceMeasure)measure).setCovarianceMatrix(m);
Path inverseCovarianceFile = new
Path("output/MahalanobisDistanceMeasureInverseCovarianceFile");
conf.set("MahalanobisDistanceMeasure.inverseCovarianceFile",
"output/MahalanobisDistanceMeasureInverseCovarianceFile");
FileSystem fs =
FileSystem.get(inverseCovarianceFile.toUri(), conf);
MatrixWritable inverseCovarianceMatrix = new
MatrixWritable(((MahalanobisDistanceMeasure)measure).getInverseCovarianceMatrix());
DataOutputStream out = fs.create(inverseCovarianceFile);
try {
inverseCovarianceMatrix.write(out);
} finally {
out.close();
}
Path meanVectorFile = new
Path("output/MahalanobisDistanceMeasureMeanVectorFile");
conf.set("MahalanobisDistanceMeasure.meanVectorFile",
"output/MahalanobisDistanceMeasureMeanVectorFile");
fs = FileSystem.get(meanVectorFile.toUri(), conf);
VectorWritable meanVectorWritable = new
VectorWritable(meanVector);
out = fs.create(meanVectorFile);
try {
meanVectorWritable.write(out);
} finally {
out.close();
}
conf.set("MahalanobisDistanceMeasure.maxtrixClass",
MatrixWritable.class.getName());
conf.set("MahalanobisDistanceMeasure.vectorClass",
VectorWritable.class.getName());
}
}
Path directoryContainingConvertedInput = new Path(output,
DIRECTORY_CONTAINING_CONVERTED_INPUT);
SynthInputDriver.runJob(input, directoryContainingConvertedInput,
"org.apache.mahout.math.RandomAccessSparseVector");
//InputDriver.runJob(input, directoryContainingConvertedInput,
"org.apache.mahout.math.RandomAccessSparseVector");
DirichletDriver.run(conf, directoryContainingConvertedInput,
output,
modelDistribution,
numModels,
maxIterations,
alpha0,
true,
emitMostLikely,
threshold,
true);
try {
ClusteredPointsConverter.convertClusteredPoints(directoryContainingConvertedInput,
new Path(output, "clusteredPoints"), new Path(output,
"convertedClusteredPoints"),
"org.apache.mahout.math.RandomAccessSparseVector");
} catch (InvocationTargetException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// run ClusterDumper
ClusterDumper clusterDumper =
new ClusterDumper(new Path(output, "clusters-" + maxIterations), new
Path(output, "convertedClusteredPoints"));
clusterDumper.printClusters(null);
}
On Tue, Mar 1, 2011 at 10:12 AM, Lance Norskog <go...@gmail.com> wrote:
> Does anybody use the Mahalanobis distance measure class? If so, what for?
> And how do you prepare the input matrices?
>
> Lance
>