You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Camilo Arango <ca...@gmail.com> on 2007/12/02 02:37:56 UTC

Help with custom key Type

Hi,
I am trying to define a custom key type in hadoop (version 0.15.0)

This is how my class looks like:

public class ClassAttributeValueKey implements WritableComparable {

 public int classification;

public int attribute;

public int value;


 public ClassAttributeValueKey() {

 }

 public ClassAttributeValueKey(int classification, int attribute, int value)
{

this.attribute = attribute;

this.value = value;

this.classification = classification;

}


 /* (non-Javadoc)

 * @see org.apache.hadoop.io.Writable#readFields(java.io.DataInput)

 */

public void readFields(DataInput in) throws IOException {

attribute = in.readInt();

value = in.readInt();

classification = in.readInt();

}


 /* (non-Javadoc)

 * @see org.apache.hadoop.io.Writable#write(java.io.DataOutput)

 */

public void write(DataOutput out) throws IOException {

out.write(attribute);

out.write(value);

out.write(classification);

}


 /* (non-Javadoc)

 * @see java.lang.Comparable#compareTo(java.lang.Object)

 */

public int compareTo(Object obj) {

ClassAttributeValueKey o = (ClassAttributeValueKey) obj;

int dif = classification - o.classification;

if(dif == 0){

int dif2 = attribute - o.attribute;

if(dif2 == 0){

return value - o.value;

}

return dif2;

}

return dif;

}

 public int getClassification() {

return classification;

}


 public int getAttribute() {

return attribute;

}


 public int getValue() {

return value;

}


 public String toString(){

return String.format("{class: %d attribute: %d value: %d}", classification,
attribute, value);

}


 @Override

public int hashCode() {

final int prime = 31;

int result = 1;

result = prime * result + attribute;

result = prime * result + classification;

result = prime * result + value;

return result;

}


 @Override

public boolean equals(Object obj) {

if (this == obj)

return true;

if (obj == null)

return false;

if (getClass() != obj.getClass())

return false;

final ClassAttributeValueKey other = (ClassAttributeValueKey) obj;

if (attribute != other.attribute)

return false;

if (classification != other.classification)

return false;

if (value != other.value)

return false;

return true;

}


}


When I try to use it as the key on the output of my mapper, I get the
following error:


java.lang.RuntimeException: java.io.EOFException
at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java
:97)
at cmput681.ClassAttributeValueKey$Comparator.compare(
ClassAttributeValueKey.java:123)
at org.apache.hadoop.mapred.BasicTypeSorterBase.compare(
BasicTypeSorterBase.java:133)
at org.apache.hadoop.mapred.MergeSorter.compare(MergeSorter.java:59)
at org.apache.hadoop.mapred.MergeSorter.compare(MergeSorter.java:35)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:46)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
at org.apache.hadoop.mapred.MergeSorter.sort(MergeSorter.java:46)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(
MapTask.java:396)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:604)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:132)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:358)
at cmput681.ClassAttributeValueKey.readFields(ClassAttributeValueKey.java
:40)
at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java
:91)
... 20 more
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
at cmput681.NaiveBayesTool.run(NaiveBayesTool.java:38)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at cmput681.NaiveBayesMain.main(NaiveBayesMain.java:9)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)


I don´t know if the problem is the implementation of my key class. Please
help me to fix it.

Thanks,

C. Arango.

Re: Help with custom key Type

Posted by Camilo Arango <ca...@gmail.com>.

This solved the problem.
Thanks for the prompt answer.

Camilo

On Dec 2, 2007 7:11 AM, Owen O'Malley <oo...@yahoo-inc.com> wrote:

>
> On Dec 2, 2007, at 2:20 AM, André Martin wrote:
>
> > Hi Camilo,
> > probably it's because you are using out.write instead of
> > out.writeInt - my guess...
>
> Your guess is right. Those should be out.writeInt. Another suggestion
> is that you should be defining the raw comparators to get better
> performance from the sort. Look at the implementation of IntWritable
> for how to do it.
>
> A meta-suggestion is to look at the record io class. In particular,
> your class would look like:
>
> module org.apache.hadoop.sample {
>   class ClassAttributeValueKey {
>     int classification;
>     int attribute;
>     int value;
>   }
> }
>
> save it to a file named mykey.jr and run bin/rcc mykey.jr to generate
> the java source code.
>
> -- Owen

Re: Help with custom key Type

Posted by Owen O'Malley <oo...@yahoo-inc.com>.

On Dec 2, 2007, at 2:20 AM, André Martin wrote:

> Hi Camilo,
> probably it's because you are using out.write instead of  
> out.writeInt - my guess...

Your guess is right. Those should be out.writeInt. Another suggestion  
is that you should be defining the raw comparators to get better  
performance from the sort. Look at the implementation of IntWritable  
for how to do it.

A meta-suggestion is to look at the record io class. In particular,  
your class would look like:

module org.apache.hadoop.sample {
   class ClassAttributeValueKey {
     int classification;
     int attribute;
     int value;
   }
}

save it to a file named mykey.jr and run bin/rcc mykey.jr to generate  
the java source code.

-- Owen

Re: Help with custom key Type

Posted by André Martin <ma...@andremartin.de>.

Hi Camilo,
probably it's because you are using out.write instead of out.writeInt - 
my guess...

Cu on the 'net,
                        Bye - bye,

                                   <<<<< André <<<< >>>> èrbnA >>>>>

Camilo Arango wrote:
> Hi,
> I am trying to define a custom key type in hadoop (version 0.15.0)
>
> This is how my class looks like:
>
> public class ClassAttributeValueKey implements WritableComparable {
>
>  public int classification;
>
> public int attribute;
>
> public int value;
>
>
>  public ClassAttributeValueKey() {
>
>  }
>
>  public ClassAttributeValueKey(int classification, int attribute, int value)
> {
>
> this.attribute = attribute;
>
> this.value = value;
>
> this.classification = classification;
>
> }
>
>
>  /* (non-Javadoc)
>
>  * @see org.apache.hadoop.io.Writable#readFields(java.io.DataInput)
>
>  */
>
> public void readFields(DataInput in) throws IOException {
>
> attribute = in.readInt();
>
> value = in.readInt();
>
> classification = in.readInt();
>
> }
>
>
>  /* (non-Javadoc)
>
>  * @see org.apache.hadoop.io.Writable#write(java.io.DataOutput)
>
>  */
>
> public void write(DataOutput out) throws IOException {
>
> out.write(attribute);
>
> out.write(value);
>
> out.write(classification);
>
> }
>
>
>  /* (non-Javadoc)
>
>  * @see java.lang.Comparable#compareTo(java.lang.Object)
>
>  */
>
> public int compareTo(Object obj) {
>
> ClassAttributeValueKey o = (ClassAttributeValueKey) obj;
>
> int dif = classification - o.classification;
>
> if(dif == 0){
>
> int dif2 = attribute - o.attribute;
>
> if(dif2 == 0){
>
> return value - o.value;
>
> }
>
> return dif2;
>
> }
>
> return dif;
>
> }
>
>  public int getClassification() {
>
> return classification;
>
> }
>
>
>  public int getAttribute() {
>
> return attribute;
>
> }
>
>
>  public int getValue() {
>
> return value;
>
> }
>
>
>  public String toString(){
>
> return String.format("{class: %d attribute: %d value: %d}", classification,
> attribute, value);
>
> }
>
>
>  @Override
>
> public int hashCode() {
>
> final int prime = 31;
>
> int result = 1;
>
> result = prime * result + attribute;
>
> result = prime * result + classification;
>
> result = prime * result + value;
>
> return result;
>
> }
>
>
>  @Override
>
> public boolean equals(Object obj) {
>
> if (this == obj)
>
> return true;
>
> if (obj == null)
>
> return false;
>
> if (getClass() != obj.getClass())
>
> return false;
>
> final ClassAttributeValueKey other = (ClassAttributeValueKey) obj;
>
> if (attribute != other.attribute)
>
> return false;
>
> if (classification != other.classification)
>
> return false;
>
> if (value != other.value)
>
> return false;
>
> return true;
>
> }
>
>
> }
>
>
> When I try to use it as the key on the output of my mapper, I get the
> following error:
>
>
> java.lang.RuntimeException: java.io.EOFException
> at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java
> :97)
> at cmput681.ClassAttributeValueKey$Comparator.compare(
> ClassAttributeValueKey.java:123)
> at org.apache.hadoop.mapred.BasicTypeSorterBase.compare(
> BasicTypeSorterBase.java:133)
> at org.apache.hadoop.mapred.MergeSorter.compare(MergeSorter.java:59)
> at org.apache.hadoop.mapred.MergeSorter.compare(MergeSorter.java:35)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:46)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.util.MergeSort.mergeSort(MergeSort.java:55)
> at org.apache.hadoop.mapred.MergeSorter.sort(MergeSorter.java:46)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(
> MapTask.java:396)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:604)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:132)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:358)
> at cmput681.ClassAttributeValueKey.readFields(ClassAttributeValueKey.java
> :40)
> at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java
> :91)
> ... 20 more
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
> at cmput681.NaiveBayesTool.run(NaiveBayesTool.java:38)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at cmput681.NaiveBayesMain.main(NaiveBayesMain.java:9)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java
> :39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>
>
> I don´t know if the problem is the implementation of my key class. Please
> help me to fix it.
>
> Thanks,
>
> C. Arango.