You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Jim Twensky <ji...@gmail.com> on 2013/05/12 20:24:48 UTC

Wrapping around BitSet with the Writable interface

I have large java.util.BitSet objects that I want to bitwise-OR using a
MapReduce job. I decided to wrap around each object using the Writable
interface. Right now I convert each BitSet to a byte array and serialize
the byte array on disk.

Converting them to byte arrays is a bit inefficient but I could not find a
work around to write them directly to the DataOutput. Is there a way to
skip this and serialize the object directly? Here is what my current
implementation looks like:

public class BitSetWritable implements Writable {

  private BitSet bs;

  public BitSetWritable() {
    this.bs = new BitSet();
  }

  @Override
  public void write(DataOutput out) throws IOException {

    ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
    ObjectOutputStream oos = new ObjectOutputStream(bos);
    oos.writeObject(bs);
    byte[] bytes = bos.toByteArray();
    oos.close();
    out.writeInt(bytes.length);
    out.write(bytes);

  }

  @Override
  public void readFields(DataInput in) throws IOException {

    int len = in.readInt();
    byte[] bytes = new byte[len];
    in.readFully(bytes);

    ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
    ObjectInputStream ois = new ObjectInputStream(bis);
    try {
      bs = (BitSet) ois.readObject();
    } catch (ClassNotFoundException e) {
      throw new IOException(e);
    }

    ois.close();
  }

}

Re: Wrapping around BitSet with the Writable interface

Posted by Jim Twensky <ji...@gmail.com>.
Thanks for the suggestions. I ended up switching to jdk 1.7+ just to make
the code more readable. I will take a look at the EWAH implementation as
well.

Jim


On Sun, May 12, 2013 at 3:40 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> You can disregard my links as their are only valid for java 1.7+.
> The JavaSerialization might clean your code but shouldn't bring a
> significant boost in performance.
> The EWAH implementation has, at least, the methods you are looking for :
> serialize / deserialize.
>
> Regards
>
> Bertrand
>
> Note to myself : I have to remember this one.
>
>
> On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Another interesting alternative is the EWAH implementation of java
>> bitsets that allow efficient compressed bitsets with very fast OR
>> operations.
>>
>> https://github.com/lemire/javaewah
>>
>> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>>
>>
>> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> In order to make the code more readable, you could start by using the
>>> methods toByteArray() and valueOf(bytes)
>>>
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>>
>>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>>> MapReduce job. I decided to wrap around each object using the Writable
>>>> interface. Right now I convert each BitSet to a byte array and serialize
>>>> the byte array on disk.
>>>>
>>>> Converting them to byte arrays is a bit inefficient but I could not
>>>> find a work around to write them directly to the DataOutput. Is there a way
>>>> to skip this and serialize the object directly? Here is what my current
>>>> implementation looks like:
>>>>
>>>> public class BitSetWritable implements Writable {
>>>>
>>>>   private BitSet bs;
>>>>
>>>>   public BitSetWritable() {
>>>>     this.bs = new BitSet();
>>>>   }
>>>>
>>>>   @Override
>>>>   public void write(DataOutput out) throws IOException {
>>>>
>>>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>>     oos.writeObject(bs);
>>>>     byte[] bytes = bos.toByteArray();
>>>>     oos.close();
>>>>     out.writeInt(bytes.length);
>>>>     out.write(bytes);
>>>>
>>>>   }
>>>>
>>>>   @Override
>>>>   public void readFields(DataInput in) throws IOException {
>>>>
>>>>     int len = in.readInt();
>>>>     byte[] bytes = new byte[len];
>>>>     in.readFully(bytes);
>>>>
>>>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>>>     try {
>>>>       bs = (BitSet) ois.readObject();
>>>>     } catch (ClassNotFoundException e) {
>>>>       throw new IOException(e);
>>>>     }
>>>>
>>>>     ois.close();
>>>>   }
>>>>
>>>> }
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: Wrapping around BitSet with the Writable interface

Posted by Jim Twensky <ji...@gmail.com>.
Thanks for the suggestions. I ended up switching to jdk 1.7+ just to make
the code more readable. I will take a look at the EWAH implementation as
well.

Jim


On Sun, May 12, 2013 at 3:40 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> You can disregard my links as their are only valid for java 1.7+.
> The JavaSerialization might clean your code but shouldn't bring a
> significant boost in performance.
> The EWAH implementation has, at least, the methods you are looking for :
> serialize / deserialize.
>
> Regards
>
> Bertrand
>
> Note to myself : I have to remember this one.
>
>
> On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Another interesting alternative is the EWAH implementation of java
>> bitsets that allow efficient compressed bitsets with very fast OR
>> operations.
>>
>> https://github.com/lemire/javaewah
>>
>> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>>
>>
>> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> In order to make the code more readable, you could start by using the
>>> methods toByteArray() and valueOf(bytes)
>>>
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>>
>>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>>> MapReduce job. I decided to wrap around each object using the Writable
>>>> interface. Right now I convert each BitSet to a byte array and serialize
>>>> the byte array on disk.
>>>>
>>>> Converting them to byte arrays is a bit inefficient but I could not
>>>> find a work around to write them directly to the DataOutput. Is there a way
>>>> to skip this and serialize the object directly? Here is what my current
>>>> implementation looks like:
>>>>
>>>> public class BitSetWritable implements Writable {
>>>>
>>>>   private BitSet bs;
>>>>
>>>>   public BitSetWritable() {
>>>>     this.bs = new BitSet();
>>>>   }
>>>>
>>>>   @Override
>>>>   public void write(DataOutput out) throws IOException {
>>>>
>>>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>>     oos.writeObject(bs);
>>>>     byte[] bytes = bos.toByteArray();
>>>>     oos.close();
>>>>     out.writeInt(bytes.length);
>>>>     out.write(bytes);
>>>>
>>>>   }
>>>>
>>>>   @Override
>>>>   public void readFields(DataInput in) throws IOException {
>>>>
>>>>     int len = in.readInt();
>>>>     byte[] bytes = new byte[len];
>>>>     in.readFully(bytes);
>>>>
>>>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>>>     try {
>>>>       bs = (BitSet) ois.readObject();
>>>>     } catch (ClassNotFoundException e) {
>>>>       throw new IOException(e);
>>>>     }
>>>>
>>>>     ois.close();
>>>>   }
>>>>
>>>> }
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: Wrapping around BitSet with the Writable interface

Posted by Jim Twensky <ji...@gmail.com>.
Thanks for the suggestions. I ended up switching to jdk 1.7+ just to make
the code more readable. I will take a look at the EWAH implementation as
well.

Jim


On Sun, May 12, 2013 at 3:40 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> You can disregard my links as their are only valid for java 1.7+.
> The JavaSerialization might clean your code but shouldn't bring a
> significant boost in performance.
> The EWAH implementation has, at least, the methods you are looking for :
> serialize / deserialize.
>
> Regards
>
> Bertrand
>
> Note to myself : I have to remember this one.
>
>
> On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Another interesting alternative is the EWAH implementation of java
>> bitsets that allow efficient compressed bitsets with very fast OR
>> operations.
>>
>> https://github.com/lemire/javaewah
>>
>> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>>
>>
>> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> In order to make the code more readable, you could start by using the
>>> methods toByteArray() and valueOf(bytes)
>>>
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>>
>>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>>> MapReduce job. I decided to wrap around each object using the Writable
>>>> interface. Right now I convert each BitSet to a byte array and serialize
>>>> the byte array on disk.
>>>>
>>>> Converting them to byte arrays is a bit inefficient but I could not
>>>> find a work around to write them directly to the DataOutput. Is there a way
>>>> to skip this and serialize the object directly? Here is what my current
>>>> implementation looks like:
>>>>
>>>> public class BitSetWritable implements Writable {
>>>>
>>>>   private BitSet bs;
>>>>
>>>>   public BitSetWritable() {
>>>>     this.bs = new BitSet();
>>>>   }
>>>>
>>>>   @Override
>>>>   public void write(DataOutput out) throws IOException {
>>>>
>>>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>>     oos.writeObject(bs);
>>>>     byte[] bytes = bos.toByteArray();
>>>>     oos.close();
>>>>     out.writeInt(bytes.length);
>>>>     out.write(bytes);
>>>>
>>>>   }
>>>>
>>>>   @Override
>>>>   public void readFields(DataInput in) throws IOException {
>>>>
>>>>     int len = in.readInt();
>>>>     byte[] bytes = new byte[len];
>>>>     in.readFully(bytes);
>>>>
>>>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>>>     try {
>>>>       bs = (BitSet) ois.readObject();
>>>>     } catch (ClassNotFoundException e) {
>>>>       throw new IOException(e);
>>>>     }
>>>>
>>>>     ois.close();
>>>>   }
>>>>
>>>> }
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: Wrapping around BitSet with the Writable interface

Posted by Jim Twensky <ji...@gmail.com>.
Thanks for the suggestions. I ended up switching to jdk 1.7+ just to make
the code more readable. I will take a look at the EWAH implementation as
well.

Jim


On Sun, May 12, 2013 at 3:40 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> You can disregard my links as their are only valid for java 1.7+.
> The JavaSerialization might clean your code but shouldn't bring a
> significant boost in performance.
> The EWAH implementation has, at least, the methods you are looking for :
> serialize / deserialize.
>
> Regards
>
> Bertrand
>
> Note to myself : I have to remember this one.
>
>
> On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Another interesting alternative is the EWAH implementation of java
>> bitsets that allow efficient compressed bitsets with very fast OR
>> operations.
>>
>> https://github.com/lemire/javaewah
>>
>> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>>
>>
>> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> In order to make the code more readable, you could start by using the
>>> methods toByteArray() and valueOf(bytes)
>>>
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>>
>>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>>> MapReduce job. I decided to wrap around each object using the Writable
>>>> interface. Right now I convert each BitSet to a byte array and serialize
>>>> the byte array on disk.
>>>>
>>>> Converting them to byte arrays is a bit inefficient but I could not
>>>> find a work around to write them directly to the DataOutput. Is there a way
>>>> to skip this and serialize the object directly? Here is what my current
>>>> implementation looks like:
>>>>
>>>> public class BitSetWritable implements Writable {
>>>>
>>>>   private BitSet bs;
>>>>
>>>>   public BitSetWritable() {
>>>>     this.bs = new BitSet();
>>>>   }
>>>>
>>>>   @Override
>>>>   public void write(DataOutput out) throws IOException {
>>>>
>>>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>>     oos.writeObject(bs);
>>>>     byte[] bytes = bos.toByteArray();
>>>>     oos.close();
>>>>     out.writeInt(bytes.length);
>>>>     out.write(bytes);
>>>>
>>>>   }
>>>>
>>>>   @Override
>>>>   public void readFields(DataInput in) throws IOException {
>>>>
>>>>     int len = in.readInt();
>>>>     byte[] bytes = new byte[len];
>>>>     in.readFully(bytes);
>>>>
>>>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>>>     try {
>>>>       bs = (BitSet) ois.readObject();
>>>>     } catch (ClassNotFoundException e) {
>>>>       throw new IOException(e);
>>>>     }
>>>>
>>>>     ois.close();
>>>>   }
>>>>
>>>> }
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: Wrapping around BitSet with the Writable interface

Posted by Bertrand Dechoux <de...@gmail.com>.
You can disregard my links as their are only valid for java 1.7+.
The JavaSerialization might clean your code but shouldn't bring a
significant boost in performance.
The EWAH implementation has, at least, the methods you are looking for :
serialize / deserialize.

Regards

Bertrand

Note to myself : I have to remember this one.


On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com> wrote:

> Another interesting alternative is the EWAH implementation of java bitsets
> that allow efficient compressed bitsets with very fast OR operations.
>
> https://github.com/lemire/javaewah
>
> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>
>
> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> In order to make the code more readable, you could start by using the
>> methods toByteArray() and valueOf(bytes)
>>
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>
>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>> MapReduce job. I decided to wrap around each object using the Writable
>>> interface. Right now I convert each BitSet to a byte array and serialize
>>> the byte array on disk.
>>>
>>> Converting them to byte arrays is a bit inefficient but I could not find
>>> a work around to write them directly to the DataOutput. Is there a way to
>>> skip this and serialize the object directly? Here is what my current
>>> implementation looks like:
>>>
>>> public class BitSetWritable implements Writable {
>>>
>>>   private BitSet bs;
>>>
>>>   public BitSetWritable() {
>>>     this.bs = new BitSet();
>>>   }
>>>
>>>   @Override
>>>   public void write(DataOutput out) throws IOException {
>>>
>>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>     oos.writeObject(bs);
>>>     byte[] bytes = bos.toByteArray();
>>>     oos.close();
>>>     out.writeInt(bytes.length);
>>>     out.write(bytes);
>>>
>>>   }
>>>
>>>   @Override
>>>   public void readFields(DataInput in) throws IOException {
>>>
>>>     int len = in.readInt();
>>>     byte[] bytes = new byte[len];
>>>     in.readFully(bytes);
>>>
>>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>>     try {
>>>       bs = (BitSet) ois.readObject();
>>>     } catch (ClassNotFoundException e) {
>>>       throw new IOException(e);
>>>     }
>>>
>>>     ois.close();
>>>   }
>>>
>>> }
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>


-- 
Bertrand Dechoux

Re: Wrapping around BitSet with the Writable interface

Posted by Bertrand Dechoux <de...@gmail.com>.
You can disregard my links as their are only valid for java 1.7+.
The JavaSerialization might clean your code but shouldn't bring a
significant boost in performance.
The EWAH implementation has, at least, the methods you are looking for :
serialize / deserialize.

Regards

Bertrand

Note to myself : I have to remember this one.


On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com> wrote:

> Another interesting alternative is the EWAH implementation of java bitsets
> that allow efficient compressed bitsets with very fast OR operations.
>
> https://github.com/lemire/javaewah
>
> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>
>
> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> In order to make the code more readable, you could start by using the
>> methods toByteArray() and valueOf(bytes)
>>
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>
>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>> MapReduce job. I decided to wrap around each object using the Writable
>>> interface. Right now I convert each BitSet to a byte array and serialize
>>> the byte array on disk.
>>>
>>> Converting them to byte arrays is a bit inefficient but I could not find
>>> a work around to write them directly to the DataOutput. Is there a way to
>>> skip this and serialize the object directly? Here is what my current
>>> implementation looks like:
>>>
>>> public class BitSetWritable implements Writable {
>>>
>>>   private BitSet bs;
>>>
>>>   public BitSetWritable() {
>>>     this.bs = new BitSet();
>>>   }
>>>
>>>   @Override
>>>   public void write(DataOutput out) throws IOException {
>>>
>>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>     oos.writeObject(bs);
>>>     byte[] bytes = bos.toByteArray();
>>>     oos.close();
>>>     out.writeInt(bytes.length);
>>>     out.write(bytes);
>>>
>>>   }
>>>
>>>   @Override
>>>   public void readFields(DataInput in) throws IOException {
>>>
>>>     int len = in.readInt();
>>>     byte[] bytes = new byte[len];
>>>     in.readFully(bytes);
>>>
>>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>>     try {
>>>       bs = (BitSet) ois.readObject();
>>>     } catch (ClassNotFoundException e) {
>>>       throw new IOException(e);
>>>     }
>>>
>>>     ois.close();
>>>   }
>>>
>>> }
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>


-- 
Bertrand Dechoux

Re: Wrapping around BitSet with the Writable interface

Posted by Bertrand Dechoux <de...@gmail.com>.
You can disregard my links as their are only valid for java 1.7+.
The JavaSerialization might clean your code but shouldn't bring a
significant boost in performance.
The EWAH implementation has, at least, the methods you are looking for :
serialize / deserialize.

Regards

Bertrand

Note to myself : I have to remember this one.


On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com> wrote:

> Another interesting alternative is the EWAH implementation of java bitsets
> that allow efficient compressed bitsets with very fast OR operations.
>
> https://github.com/lemire/javaewah
>
> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>
>
> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> In order to make the code more readable, you could start by using the
>> methods toByteArray() and valueOf(bytes)
>>
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>
>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>> MapReduce job. I decided to wrap around each object using the Writable
>>> interface. Right now I convert each BitSet to a byte array and serialize
>>> the byte array on disk.
>>>
>>> Converting them to byte arrays is a bit inefficient but I could not find
>>> a work around to write them directly to the DataOutput. Is there a way to
>>> skip this and serialize the object directly? Here is what my current
>>> implementation looks like:
>>>
>>> public class BitSetWritable implements Writable {
>>>
>>>   private BitSet bs;
>>>
>>>   public BitSetWritable() {
>>>     this.bs = new BitSet();
>>>   }
>>>
>>>   @Override
>>>   public void write(DataOutput out) throws IOException {
>>>
>>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>     oos.writeObject(bs);
>>>     byte[] bytes = bos.toByteArray();
>>>     oos.close();
>>>     out.writeInt(bytes.length);
>>>     out.write(bytes);
>>>
>>>   }
>>>
>>>   @Override
>>>   public void readFields(DataInput in) throws IOException {
>>>
>>>     int len = in.readInt();
>>>     byte[] bytes = new byte[len];
>>>     in.readFully(bytes);
>>>
>>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>>     try {
>>>       bs = (BitSet) ois.readObject();
>>>     } catch (ClassNotFoundException e) {
>>>       throw new IOException(e);
>>>     }
>>>
>>>     ois.close();
>>>   }
>>>
>>> }
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>


-- 
Bertrand Dechoux

Re: Wrapping around BitSet with the Writable interface

Posted by Bertrand Dechoux <de...@gmail.com>.
You can disregard my links as their are only valid for java 1.7+.
The JavaSerialization might clean your code but shouldn't bring a
significant boost in performance.
The EWAH implementation has, at least, the methods you are looking for :
serialize / deserialize.

Regards

Bertrand

Note to myself : I have to remember this one.


On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com> wrote:

> Another interesting alternative is the EWAH implementation of java bitsets
> that allow efficient compressed bitsets with very fast OR operations.
>
> https://github.com/lemire/javaewah
>
> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>
>
> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> In order to make the code more readable, you could start by using the
>> methods toByteArray() and valueOf(bytes)
>>
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>
>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>> MapReduce job. I decided to wrap around each object using the Writable
>>> interface. Right now I convert each BitSet to a byte array and serialize
>>> the byte array on disk.
>>>
>>> Converting them to byte arrays is a bit inefficient but I could not find
>>> a work around to write them directly to the DataOutput. Is there a way to
>>> skip this and serialize the object directly? Here is what my current
>>> implementation looks like:
>>>
>>> public class BitSetWritable implements Writable {
>>>
>>>   private BitSet bs;
>>>
>>>   public BitSetWritable() {
>>>     this.bs = new BitSet();
>>>   }
>>>
>>>   @Override
>>>   public void write(DataOutput out) throws IOException {
>>>
>>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>     oos.writeObject(bs);
>>>     byte[] bytes = bos.toByteArray();
>>>     oos.close();
>>>     out.writeInt(bytes.length);
>>>     out.write(bytes);
>>>
>>>   }
>>>
>>>   @Override
>>>   public void readFields(DataInput in) throws IOException {
>>>
>>>     int len = in.readInt();
>>>     byte[] bytes = new byte[len];
>>>     in.readFully(bytes);
>>>
>>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>>     try {
>>>       bs = (BitSet) ois.readObject();
>>>     } catch (ClassNotFoundException e) {
>>>       throw new IOException(e);
>>>     }
>>>
>>>     ois.close();
>>>   }
>>>
>>> }
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>


-- 
Bertrand Dechoux

Re: Wrapping around BitSet with the Writable interface

Posted by Ted Dunning <td...@maprtech.com>.
Another interesting alternative is the EWAH implementation of java bitsets
that allow efficient compressed bitsets with very fast OR operations.

https://github.com/lemire/javaewah

See also https://code.google.com/p/sparsebitmap/ by the same authors.


On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> In order to make the code more readable, you could start by using the
> methods toByteArray() and valueOf(bytes)
>
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>
> Regards
>
> Bertrand
>
>
> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>
>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>> MapReduce job. I decided to wrap around each object using the Writable
>> interface. Right now I convert each BitSet to a byte array and serialize
>> the byte array on disk.
>>
>> Converting them to byte arrays is a bit inefficient but I could not find
>> a work around to write them directly to the DataOutput. Is there a way to
>> skip this and serialize the object directly? Here is what my current
>> implementation looks like:
>>
>> public class BitSetWritable implements Writable {
>>
>>   private BitSet bs;
>>
>>   public BitSetWritable() {
>>     this.bs = new BitSet();
>>   }
>>
>>   @Override
>>   public void write(DataOutput out) throws IOException {
>>
>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>     oos.writeObject(bs);
>>     byte[] bytes = bos.toByteArray();
>>     oos.close();
>>     out.writeInt(bytes.length);
>>     out.write(bytes);
>>
>>   }
>>
>>   @Override
>>   public void readFields(DataInput in) throws IOException {
>>
>>     int len = in.readInt();
>>     byte[] bytes = new byte[len];
>>     in.readFully(bytes);
>>
>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>     try {
>>       bs = (BitSet) ois.readObject();
>>     } catch (ClassNotFoundException e) {
>>       throw new IOException(e);
>>     }
>>
>>     ois.close();
>>   }
>>
>> }
>>
>
>
>
> --
> Bertrand Dechoux
>

Re: Wrapping around BitSet with the Writable interface

Posted by Ted Dunning <td...@maprtech.com>.
Another interesting alternative is the EWAH implementation of java bitsets
that allow efficient compressed bitsets with very fast OR operations.

https://github.com/lemire/javaewah

See also https://code.google.com/p/sparsebitmap/ by the same authors.


On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> In order to make the code more readable, you could start by using the
> methods toByteArray() and valueOf(bytes)
>
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>
> Regards
>
> Bertrand
>
>
> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>
>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>> MapReduce job. I decided to wrap around each object using the Writable
>> interface. Right now I convert each BitSet to a byte array and serialize
>> the byte array on disk.
>>
>> Converting them to byte arrays is a bit inefficient but I could not find
>> a work around to write them directly to the DataOutput. Is there a way to
>> skip this and serialize the object directly? Here is what my current
>> implementation looks like:
>>
>> public class BitSetWritable implements Writable {
>>
>>   private BitSet bs;
>>
>>   public BitSetWritable() {
>>     this.bs = new BitSet();
>>   }
>>
>>   @Override
>>   public void write(DataOutput out) throws IOException {
>>
>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>     oos.writeObject(bs);
>>     byte[] bytes = bos.toByteArray();
>>     oos.close();
>>     out.writeInt(bytes.length);
>>     out.write(bytes);
>>
>>   }
>>
>>   @Override
>>   public void readFields(DataInput in) throws IOException {
>>
>>     int len = in.readInt();
>>     byte[] bytes = new byte[len];
>>     in.readFully(bytes);
>>
>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>     try {
>>       bs = (BitSet) ois.readObject();
>>     } catch (ClassNotFoundException e) {
>>       throw new IOException(e);
>>     }
>>
>>     ois.close();
>>   }
>>
>> }
>>
>
>
>
> --
> Bertrand Dechoux
>

Re: Wrapping around BitSet with the Writable interface

Posted by Ted Dunning <td...@maprtech.com>.
Another interesting alternative is the EWAH implementation of java bitsets
that allow efficient compressed bitsets with very fast OR operations.

https://github.com/lemire/javaewah

See also https://code.google.com/p/sparsebitmap/ by the same authors.


On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> In order to make the code more readable, you could start by using the
> methods toByteArray() and valueOf(bytes)
>
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>
> Regards
>
> Bertrand
>
>
> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>
>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>> MapReduce job. I decided to wrap around each object using the Writable
>> interface. Right now I convert each BitSet to a byte array and serialize
>> the byte array on disk.
>>
>> Converting them to byte arrays is a bit inefficient but I could not find
>> a work around to write them directly to the DataOutput. Is there a way to
>> skip this and serialize the object directly? Here is what my current
>> implementation looks like:
>>
>> public class BitSetWritable implements Writable {
>>
>>   private BitSet bs;
>>
>>   public BitSetWritable() {
>>     this.bs = new BitSet();
>>   }
>>
>>   @Override
>>   public void write(DataOutput out) throws IOException {
>>
>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>     oos.writeObject(bs);
>>     byte[] bytes = bos.toByteArray();
>>     oos.close();
>>     out.writeInt(bytes.length);
>>     out.write(bytes);
>>
>>   }
>>
>>   @Override
>>   public void readFields(DataInput in) throws IOException {
>>
>>     int len = in.readInt();
>>     byte[] bytes = new byte[len];
>>     in.readFully(bytes);
>>
>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>     try {
>>       bs = (BitSet) ois.readObject();
>>     } catch (ClassNotFoundException e) {
>>       throw new IOException(e);
>>     }
>>
>>     ois.close();
>>   }
>>
>> }
>>
>
>
>
> --
> Bertrand Dechoux
>

Re: Wrapping around BitSet with the Writable interface

Posted by Ted Dunning <td...@maprtech.com>.
Another interesting alternative is the EWAH implementation of java bitsets
that allow efficient compressed bitsets with very fast OR operations.

https://github.com/lemire/javaewah

See also https://code.google.com/p/sparsebitmap/ by the same authors.


On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> In order to make the code more readable, you could start by using the
> methods toByteArray() and valueOf(bytes)
>
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>
> Regards
>
> Bertrand
>
>
> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>
>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>> MapReduce job. I decided to wrap around each object using the Writable
>> interface. Right now I convert each BitSet to a byte array and serialize
>> the byte array on disk.
>>
>> Converting them to byte arrays is a bit inefficient but I could not find
>> a work around to write them directly to the DataOutput. Is there a way to
>> skip this and serialize the object directly? Here is what my current
>> implementation looks like:
>>
>> public class BitSetWritable implements Writable {
>>
>>   private BitSet bs;
>>
>>   public BitSetWritable() {
>>     this.bs = new BitSet();
>>   }
>>
>>   @Override
>>   public void write(DataOutput out) throws IOException {
>>
>>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>>     oos.writeObject(bs);
>>     byte[] bytes = bos.toByteArray();
>>     oos.close();
>>     out.writeInt(bytes.length);
>>     out.write(bytes);
>>
>>   }
>>
>>   @Override
>>   public void readFields(DataInput in) throws IOException {
>>
>>     int len = in.readInt();
>>     byte[] bytes = new byte[len];
>>     in.readFully(bytes);
>>
>>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>     ObjectInputStream ois = new ObjectInputStream(bis);
>>     try {
>>       bs = (BitSet) ois.readObject();
>>     } catch (ClassNotFoundException e) {
>>       throw new IOException(e);
>>     }
>>
>>     ois.close();
>>   }
>>
>> }
>>
>
>
>
> --
> Bertrand Dechoux
>

Re: Wrapping around BitSet with the Writable interface

Posted by Bertrand Dechoux <de...@gmail.com>.
In order to make the code more readable, you could start by using the
methods toByteArray() and valueOf(bytes)

http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29

Regards

Bertrand


On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com> wrote:

> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize
> the byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to
> skip this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
>   private BitSet bs;
>
>   public BitSetWritable() {
>     this.bs = new BitSet();
>   }
>
>   @Override
>   public void write(DataOutput out) throws IOException {
>
>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>     oos.writeObject(bs);
>     byte[] bytes = bos.toByteArray();
>     oos.close();
>     out.writeInt(bytes.length);
>     out.write(bytes);
>
>   }
>
>   @Override
>   public void readFields(DataInput in) throws IOException {
>
>     int len = in.readInt();
>     byte[] bytes = new byte[len];
>     in.readFully(bytes);
>
>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>     ObjectInputStream ois = new ObjectInputStream(bis);
>     try {
>       bs = (BitSet) ois.readObject();
>     } catch (ClassNotFoundException e) {
>       throw new IOException(e);
>     }
>
>     ois.close();
>   }
>
> }
>



-- 
Bertrand Dechoux

Re: Wrapping around BitSet with the Writable interface

Posted by Bertrand Dechoux <de...@gmail.com>.
In order to make the code more readable, you could start by using the
methods toByteArray() and valueOf(bytes)

http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29

Regards

Bertrand


On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com> wrote:

> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize
> the byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to
> skip this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
>   private BitSet bs;
>
>   public BitSetWritable() {
>     this.bs = new BitSet();
>   }
>
>   @Override
>   public void write(DataOutput out) throws IOException {
>
>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>     oos.writeObject(bs);
>     byte[] bytes = bos.toByteArray();
>     oos.close();
>     out.writeInt(bytes.length);
>     out.write(bytes);
>
>   }
>
>   @Override
>   public void readFields(DataInput in) throws IOException {
>
>     int len = in.readInt();
>     byte[] bytes = new byte[len];
>     in.readFully(bytes);
>
>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>     ObjectInputStream ois = new ObjectInputStream(bis);
>     try {
>       bs = (BitSet) ois.readObject();
>     } catch (ClassNotFoundException e) {
>       throw new IOException(e);
>     }
>
>     ois.close();
>   }
>
> }
>



-- 
Bertrand Dechoux

Re: Wrapping around BitSet with the Writable interface

Posted by Harsh J <ha...@cloudera.com>.
You can perhaps consider using the experimental JavaSerialization [1]
enhancement to skip transforming to
Writables/other-serialization-formats. It may be slower but looks like
you are looking for a way to avoid transforming objects.

Enable by adding the class
org.apache.hadoop.io.serializer.JavaSerialization to the list of
io.serializations like so in your client configuration:

<property>
  <name>io.serializations</name>
  <value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization,org.apache.hadoop.io.serializer.JavaSerialization</value>
</property>

And you should then be able to rely on Java's inbuilt serialization to
directly serialize your BitSet object?

[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/serializer/JavaSerialization.html

On Sun, May 12, 2013 at 11:54 PM, Jim Twensky <ji...@gmail.com> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize the
> byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to skip
> this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
>   private BitSet bs;
>
>   public BitSetWritable() {
>     this.bs = new BitSet();
>   }
>
>   @Override
>   public void write(DataOutput out) throws IOException {
>
>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>     oos.writeObject(bs);
>     byte[] bytes = bos.toByteArray();
>     oos.close();
>     out.writeInt(bytes.length);
>     out.write(bytes);
>
>   }
>
>   @Override
>   public void readFields(DataInput in) throws IOException {
>
>     int len = in.readInt();
>     byte[] bytes = new byte[len];
>     in.readFully(bytes);
>
>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>     ObjectInputStream ois = new ObjectInputStream(bis);
>     try {
>       bs = (BitSet) ois.readObject();
>     } catch (ClassNotFoundException e) {
>       throw new IOException(e);
>     }
>
>     ois.close();
>   }
>
> }



--
Harsh J

Re: Wrapping around BitSet with the Writable interface

Posted by Harsh J <ha...@cloudera.com>.
You can perhaps consider using the experimental JavaSerialization [1]
enhancement to skip transforming to
Writables/other-serialization-formats. It may be slower but looks like
you are looking for a way to avoid transforming objects.

Enable by adding the class
org.apache.hadoop.io.serializer.JavaSerialization to the list of
io.serializations like so in your client configuration:

<property>
  <name>io.serializations</name>
  <value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization,org.apache.hadoop.io.serializer.JavaSerialization</value>
</property>

And you should then be able to rely on Java's inbuilt serialization to
directly serialize your BitSet object?

[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/serializer/JavaSerialization.html

On Sun, May 12, 2013 at 11:54 PM, Jim Twensky <ji...@gmail.com> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize the
> byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to skip
> this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
>   private BitSet bs;
>
>   public BitSetWritable() {
>     this.bs = new BitSet();
>   }
>
>   @Override
>   public void write(DataOutput out) throws IOException {
>
>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>     oos.writeObject(bs);
>     byte[] bytes = bos.toByteArray();
>     oos.close();
>     out.writeInt(bytes.length);
>     out.write(bytes);
>
>   }
>
>   @Override
>   public void readFields(DataInput in) throws IOException {
>
>     int len = in.readInt();
>     byte[] bytes = new byte[len];
>     in.readFully(bytes);
>
>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>     ObjectInputStream ois = new ObjectInputStream(bis);
>     try {
>       bs = (BitSet) ois.readObject();
>     } catch (ClassNotFoundException e) {
>       throw new IOException(e);
>     }
>
>     ois.close();
>   }
>
> }



--
Harsh J

Re: Wrapping around BitSet with the Writable interface

Posted by Bertrand Dechoux <de...@gmail.com>.
In order to make the code more readable, you could start by using the
methods toByteArray() and valueOf(bytes)

http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29

Regards

Bertrand


On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com> wrote:

> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize
> the byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to
> skip this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
>   private BitSet bs;
>
>   public BitSetWritable() {
>     this.bs = new BitSet();
>   }
>
>   @Override
>   public void write(DataOutput out) throws IOException {
>
>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>     oos.writeObject(bs);
>     byte[] bytes = bos.toByteArray();
>     oos.close();
>     out.writeInt(bytes.length);
>     out.write(bytes);
>
>   }
>
>   @Override
>   public void readFields(DataInput in) throws IOException {
>
>     int len = in.readInt();
>     byte[] bytes = new byte[len];
>     in.readFully(bytes);
>
>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>     ObjectInputStream ois = new ObjectInputStream(bis);
>     try {
>       bs = (BitSet) ois.readObject();
>     } catch (ClassNotFoundException e) {
>       throw new IOException(e);
>     }
>
>     ois.close();
>   }
>
> }
>



-- 
Bertrand Dechoux

Re: Wrapping around BitSet with the Writable interface

Posted by Harsh J <ha...@cloudera.com>.
You can perhaps consider using the experimental JavaSerialization [1]
enhancement to skip transforming to
Writables/other-serialization-formats. It may be slower but looks like
you are looking for a way to avoid transforming objects.

Enable by adding the class
org.apache.hadoop.io.serializer.JavaSerialization to the list of
io.serializations like so in your client configuration:

<property>
  <name>io.serializations</name>
  <value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization,org.apache.hadoop.io.serializer.JavaSerialization</value>
</property>

And you should then be able to rely on Java's inbuilt serialization to
directly serialize your BitSet object?

[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/serializer/JavaSerialization.html

On Sun, May 12, 2013 at 11:54 PM, Jim Twensky <ji...@gmail.com> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize the
> byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to skip
> this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
>   private BitSet bs;
>
>   public BitSetWritable() {
>     this.bs = new BitSet();
>   }
>
>   @Override
>   public void write(DataOutput out) throws IOException {
>
>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>     oos.writeObject(bs);
>     byte[] bytes = bos.toByteArray();
>     oos.close();
>     out.writeInt(bytes.length);
>     out.write(bytes);
>
>   }
>
>   @Override
>   public void readFields(DataInput in) throws IOException {
>
>     int len = in.readInt();
>     byte[] bytes = new byte[len];
>     in.readFully(bytes);
>
>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>     ObjectInputStream ois = new ObjectInputStream(bis);
>     try {
>       bs = (BitSet) ois.readObject();
>     } catch (ClassNotFoundException e) {
>       throw new IOException(e);
>     }
>
>     ois.close();
>   }
>
> }



--
Harsh J

Re: Wrapping around BitSet with the Writable interface

Posted by Harsh J <ha...@cloudera.com>.
You can perhaps consider using the experimental JavaSerialization [1]
enhancement to skip transforming to
Writables/other-serialization-formats. It may be slower but looks like
you are looking for a way to avoid transforming objects.

Enable by adding the class
org.apache.hadoop.io.serializer.JavaSerialization to the list of
io.serializations like so in your client configuration:

<property>
  <name>io.serializations</name>
  <value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization,org.apache.hadoop.io.serializer.JavaSerialization</value>
</property>

And you should then be able to rely on Java's inbuilt serialization to
directly serialize your BitSet object?

[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/serializer/JavaSerialization.html

On Sun, May 12, 2013 at 11:54 PM, Jim Twensky <ji...@gmail.com> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize the
> byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to skip
> this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
>   private BitSet bs;
>
>   public BitSetWritable() {
>     this.bs = new BitSet();
>   }
>
>   @Override
>   public void write(DataOutput out) throws IOException {
>
>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>     oos.writeObject(bs);
>     byte[] bytes = bos.toByteArray();
>     oos.close();
>     out.writeInt(bytes.length);
>     out.write(bytes);
>
>   }
>
>   @Override
>   public void readFields(DataInput in) throws IOException {
>
>     int len = in.readInt();
>     byte[] bytes = new byte[len];
>     in.readFully(bytes);
>
>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>     ObjectInputStream ois = new ObjectInputStream(bis);
>     try {
>       bs = (BitSet) ois.readObject();
>     } catch (ClassNotFoundException e) {
>       throw new IOException(e);
>     }
>
>     ois.close();
>   }
>
> }



--
Harsh J

Re: Wrapping around BitSet with the Writable interface

Posted by Bertrand Dechoux <de...@gmail.com>.
In order to make the code more readable, you could start by using the
methods toByteArray() and valueOf(bytes)

http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29

Regards

Bertrand


On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com> wrote:

> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize
> the byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to
> skip this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
>   private BitSet bs;
>
>   public BitSetWritable() {
>     this.bs = new BitSet();
>   }
>
>   @Override
>   public void write(DataOutput out) throws IOException {
>
>     ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>     ObjectOutputStream oos = new ObjectOutputStream(bos);
>     oos.writeObject(bs);
>     byte[] bytes = bos.toByteArray();
>     oos.close();
>     out.writeInt(bytes.length);
>     out.write(bytes);
>
>   }
>
>   @Override
>   public void readFields(DataInput in) throws IOException {
>
>     int len = in.readInt();
>     byte[] bytes = new byte[len];
>     in.readFully(bytes);
>
>     ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>     ObjectInputStream ois = new ObjectInputStream(bis);
>     try {
>       bs = (BitSet) ois.readObject();
>     } catch (ClassNotFoundException e) {
>       throw new IOException(e);
>     }
>
>     ois.close();
>   }
>
> }
>



-- 
Bertrand Dechoux