You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Camilo Arango <ca...@gmail.com> on 2007/12/02 02:37:56 UTC
Help with custom key Type
Hi,
I am trying to define a custom key type in hadoop (version 0.15.0)
This is how my class looks like:
public class ClassAttributeValueKey implements WritableComparable {
public int classification;
public int attribute;
public int value;
public ClassAttributeValueKey() {
}
public ClassAttributeValueKey(int classification, int attribute, int value)
{
this.attribute = attribute;
this.value = value;
this.classification = classification;
}
/* (non-Javadoc)
* @see org.apache.hadoop.io.Writable#readFields(java.io.DataInput)
*/
public void readFields(DataInput in) throws IOException {
attribute = in.readInt();
value = in.readInt();
classification = in.readInt();
}
/* (non-Javadoc)
* @see org.apache.hadoop.io.Writable#write(java.io.DataOutput)
*/
public void write(DataOutput out) throws IOException {
out.write(attribute);
out.write(value);
out.write(classification);
}
/* (non-Javadoc)
* @see java.lang.Comparable#compareTo(java.lang.Object)
*/
public int compareTo(Object obj) {
ClassAttributeValueKey o = (ClassAttributeValueKey) obj;
int dif = classification - o.classification;
if(dif == 0){
int dif2 = attribute - o.attribute;
if(dif2 == 0){
return value - o.value;
}
return dif2;
}
return dif;
}
public int getClassification() {
return classification;
}
public int getAttribute() {
return attribute;
}
public int getValue() {
return value;
}
public String toString(){
return String.format("{class: %d attribute: %d value: %d}", classification,
attribute, value);
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + attribute;
result = prime * result + classification;
result = prime * result + value;
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
final ClassAttributeValueKey other = (ClassAttributeValueKey) obj;
if (attribute != other.attribute)
return false;
if (classification != other.classification)
return false;
if (value != other.value)
return false;
return true;
}
}
When I try to use it as the key on the output of my mapper, I get the
following error:
java.lang.RuntimeException: java.io.EOFException
at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java
:97)
at cmput681.ClassAttributeValueKey$Comparator.compare(
ClassAttributeValueKey.java:123)
at org.apache.hadoop.mapred.BasicTypeSorterBase.compare(
BasicTypeSorterBase.java:133)
at org.apache.hadoop.mapred.MergeSorter.compare(MergeSorter.java:59)
at org.apache.hadoop.mapred.MergeSorter.compare(MergeSorter.java:35)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:46)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.mapred.MergeSorter.sort(MergeSorter.java:46)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(
MapTask.java:396)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:604)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:132)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:358)
at cmput681.ClassAttributeValueKey.readFields(ClassAttributeValueKey.java
:40)
at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java
:91)
... 20 more
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
at cmput681.NaiveBayesTool.run(NaiveBayesTool.java:38)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at cmput681.NaiveBayesMain.main(NaiveBayesMain.java:9)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
I don´t know if the problem is the implementation of my key class. Please
help me to fix it.
Thanks,
C. Arango.
Re: Help with custom key Type
Posted by Camilo Arango <ca...@gmail.com>.
This solved the problem.
Thanks for the prompt answer.
Camilo
On Dec 2, 2007 7:11 AM, Owen O'Malley <oo...@yahoo-inc.com> wrote:
>
> On Dec 2, 2007, at 2:20 AM, André Martin wrote:
>
> > Hi Camilo,
> > probably it's because you are using out.write instead of
> > out.writeInt - my guess...
>
> Your guess is right. Those should be out.writeInt. Another suggestion
> is that you should be defining the raw comparators to get better
> performance from the sort. Look at the implementation of IntWritable
> for how to do it.
>
> A meta-suggestion is to look at the record io class. In particular,
> your class would look like:
>
> module org.apache.hadoop.sample {
> class ClassAttributeValueKey {
> int classification;
> int attribute;
> int value;
> }
> }
>
> save it to a file named mykey.jr and run bin/rcc mykey.jr to generate
> the java source code.
>
> -- Owen
Re: Help with custom key Type
Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Dec 2, 2007, at 2:20 AM, André Martin wrote:
> Hi Camilo,
> probably it's because you are using out.write instead of
> out.writeInt - my guess...
Your guess is right. Those should be out.writeInt. Another suggestion
is that you should be defining the raw comparators to get better
performance from the sort. Look at the implementation of IntWritable
for how to do it.
A meta-suggestion is to look at the record io class. In particular,
your class would look like:
module org.apache.hadoop.sample {
class ClassAttributeValueKey {
int classification;
int attribute;
int value;
}
}
save it to a file named mykey.jr and run bin/rcc mykey.jr to generate
the java source code.
-- Owen
Re: Help with custom key Type
Posted by André Martin <ma...@andremartin.de>.
Hi Camilo,
probably it's because you are using out.write instead of out.writeInt -
my guess...
Cu on the 'net,
Bye - bye,
<<<<< André <<<< >>>> èrbnA >>>>>
Camilo Arango wrote:
> Hi,
> I am trying to define a custom key type in hadoop (version 0.15.0)
>
> This is how my class looks like:
>
> public class ClassAttributeValueKey implements WritableComparable {
>
> public int classification;
>
> public int attribute;
>
> public int value;
>
>
> public ClassAttributeValueKey() {
>
> }
>
> public ClassAttributeValueKey(int classification, int attribute, int value)
> {
>
> this.attribute = attribute;
>
> this.value = value;
>
> this.classification = classification;
>
> }
>
>
> /* (non-Javadoc)
>
> * @see org.apache.hadoop.io.Writable#readFields(java.io.DataInput)
>
> */
>
> public void readFields(DataInput in) throws IOException {
>
> attribute = in.readInt();
>
> value = in.readInt();
>
> classification = in.readInt();
>
> }
>
>
> /* (non-Javadoc)
>
> * @see org.apache.hadoop.io.Writable#write(java.io.DataOutput)
>
> */
>
> public void write(DataOutput out) throws IOException {
>
> out.write(attribute);
>
> out.write(value);
>
> out.write(classification);
>
> }
>
>
> /* (non-Javadoc)
>
> * @see java.lang.Comparable#compareTo(java.lang.Object)
>
> */
>
> public int compareTo(Object obj) {
>
> ClassAttributeValueKey o = (ClassAttributeValueKey) obj;
>
> int dif = classification - o.classification;
>
> if(dif == 0){
>
> int dif2 = attribute - o.attribute;
>
> if(dif2 == 0){
>
> return value - o.value;
>
> }
>
> return dif2;
>
> }
>
> return dif;
>
> }
>
> public int getClassification() {
>
> return classification;
>
> }
>
>
> public int getAttribute() {
>
> return attribute;
>
> }
>
>
> public int getValue() {
>
> return value;
>
> }
>
>
> public String toString(){
>
> return String.format("{class: %d attribute: %d value: %d}", classification,
> attribute, value);
>
> }
>
>
> @Override
>
> public int hashCode() {
>
> final int prime = 31;
>
> int result = 1;
>
> result = prime * result + attribute;
>
> result = prime * result + classification;
>
> result = prime * result + value;
>
> return result;
>
> }
>
>
> @Override
>
> public boolean equals(Object obj) {
>
> if (this == obj)
>
> return true;
>
> if (obj == null)
>
> return false;
>
> if (getClass() != obj.getClass())
>
> return false;
>
> final ClassAttributeValueKey other = (ClassAttributeValueKey) obj;
>
> if (attribute != other.attribute)
>
> return false;
>
> if (classification != other.classification)
>
> return false;
>
> if (value != other.value)
>
> return false;
>
> return true;
>
> }
>
>
> }
>
>
> When I try to use it as the key on the output of my mapper, I get the
> following error:
>
>
> java.lang.RuntimeException: java.io.EOFException
> at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java
> :97)
> at cmput681.ClassAttributeValueKey$Comparator.compare(
> ClassAttributeValueKey.java:123)
> at org.apache.hadoop.mapred.BasicTypeSorterBase.compare(
> BasicTypeSorterBase.java:133)
> at org.apache.hadoop.mapred.MergeSorter.compare(MergeSorter.java:59)
> at org.apache.hadoop.mapred.MergeSorter.compare(MergeSorter.java:35)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:46)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.mapred.MergeSorter.sort(MergeSorter.java:46)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(
> MapTask.java:396)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:604)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:132)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:358)
> at cmput681.ClassAttributeValueKey.readFields(ClassAttributeValueKey.java
> :40)
> at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java
> :91)
> ... 20 more
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
> at cmput681.NaiveBayesTool.run(NaiveBayesTool.java:38)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at cmput681.NaiveBayesMain.main(NaiveBayesMain.java:9)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
> :39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>
>
> I don´t know if the problem is the implementation of my key class. Please
> help me to fix it.
>
> Thanks,
>
> C. Arango.