You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Jim Twensky <ji...@gmail.com> on 2013/05/12 20:24:48 UTC
Wrapping around BitSet with the Writable interface
I have large java.util.BitSet objects that I want to bitwise-OR using a
MapReduce job. I decided to wrap around each object using the Writable
interface. Right now I convert each BitSet to a byte array and serialize
the byte array on disk.
Converting them to byte arrays is a bit inefficient but I could not find a
work around to write them directly to the DataOutput. Is there a way to
skip this and serialize the object directly? Here is what my current
implementation looks like:
public class BitSetWritable implements Writable {
private BitSet bs;
public BitSetWritable() {
this.bs = new BitSet();
}
@Override
public void write(DataOutput out) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
ObjectOutputStream oos = new ObjectOutputStream(bos);
oos.writeObject(bs);
byte[] bytes = bos.toByteArray();
oos.close();
out.writeInt(bytes.length);
out.write(bytes);
}
@Override
public void readFields(DataInput in) throws IOException {
int len = in.readInt();
byte[] bytes = new byte[len];
in.readFully(bytes);
ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
ObjectInputStream ois = new ObjectInputStream(bis);
try {
bs = (BitSet) ois.readObject();
} catch (ClassNotFoundException e) {
throw new IOException(e);
}
ois.close();
}
}
Re: Wrapping around BitSet with the Writable interface
Posted by Jim Twensky <ji...@gmail.com>.
Thanks for the suggestions. I ended up switching to jdk 1.7+ just to make
the code more readable. I will take a look at the EWAH implementation as
well.
Jim
On Sun, May 12, 2013 at 3:40 PM, Bertrand Dechoux <de...@gmail.com>wrote:
> You can disregard my links as their are only valid for java 1.7+.
> The JavaSerialization might clean your code but shouldn't bring a
> significant boost in performance.
> The EWAH implementation has, at least, the methods you are looking for :
> serialize / deserialize.
>
> Regards
>
> Bertrand
>
> Note to myself : I have to remember this one.
>
>
> On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Another interesting alternative is the EWAH implementation of java
>> bitsets that allow efficient compressed bitsets with very fast OR
>> operations.
>>
>> https://github.com/lemire/javaewah
>>
>> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>>
>>
>> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> In order to make the code more readable, you could start by using the
>>> methods toByteArray() and valueOf(bytes)
>>>
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>>
>>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>>> MapReduce job. I decided to wrap around each object using the Writable
>>>> interface. Right now I convert each BitSet to a byte array and serialize
>>>> the byte array on disk.
>>>>
>>>> Converting them to byte arrays is a bit inefficient but I could not
>>>> find a work around to write them directly to the DataOutput. Is there a way
>>>> to skip this and serialize the object directly? Here is what my current
>>>> implementation looks like:
>>>>
>>>> public class BitSetWritable implements Writable {
>>>>
>>>> private BitSet bs;
>>>>
>>>> public BitSetWritable() {
>>>> this.bs = new BitSet();
>>>> }
>>>>
>>>> @Override
>>>> public void write(DataOutput out) throws IOException {
>>>>
>>>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>> ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>> oos.writeObject(bs);
>>>> byte[] bytes = bos.toByteArray();
>>>> oos.close();
>>>> out.writeInt(bytes.length);
>>>> out.write(bytes);
>>>>
>>>> }
>>>>
>>>> @Override
>>>> public void readFields(DataInput in) throws IOException {
>>>>
>>>> int len = in.readInt();
>>>> byte[] bytes = new byte[len];
>>>> in.readFully(bytes);
>>>>
>>>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>> ObjectInputStream ois = new ObjectInputStream(bis);
>>>> try {
>>>> bs = (BitSet) ois.readObject();
>>>> } catch (ClassNotFoundException e) {
>>>> throw new IOException(e);
>>>> }
>>>>
>>>> ois.close();
>>>> }
>>>>
>>>> }
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>
>
> --
> Bertrand Dechoux
>
Re: Wrapping around BitSet with the Writable interface
Posted by Jim Twensky <ji...@gmail.com>.
Thanks for the suggestions. I ended up switching to jdk 1.7+ just to make
the code more readable. I will take a look at the EWAH implementation as
well.
Jim
On Sun, May 12, 2013 at 3:40 PM, Bertrand Dechoux <de...@gmail.com>wrote:
> You can disregard my links as their are only valid for java 1.7+.
> The JavaSerialization might clean your code but shouldn't bring a
> significant boost in performance.
> The EWAH implementation has, at least, the methods you are looking for :
> serialize / deserialize.
>
> Regards
>
> Bertrand
>
> Note to myself : I have to remember this one.
>
>
> On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Another interesting alternative is the EWAH implementation of java
>> bitsets that allow efficient compressed bitsets with very fast OR
>> operations.
>>
>> https://github.com/lemire/javaewah
>>
>> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>>
>>
>> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> In order to make the code more readable, you could start by using the
>>> methods toByteArray() and valueOf(bytes)
>>>
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>>
>>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>>> MapReduce job. I decided to wrap around each object using the Writable
>>>> interface. Right now I convert each BitSet to a byte array and serialize
>>>> the byte array on disk.
>>>>
>>>> Converting them to byte arrays is a bit inefficient but I could not
>>>> find a work around to write them directly to the DataOutput. Is there a way
>>>> to skip this and serialize the object directly? Here is what my current
>>>> implementation looks like:
>>>>
>>>> public class BitSetWritable implements Writable {
>>>>
>>>> private BitSet bs;
>>>>
>>>> public BitSetWritable() {
>>>> this.bs = new BitSet();
>>>> }
>>>>
>>>> @Override
>>>> public void write(DataOutput out) throws IOException {
>>>>
>>>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>> ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>> oos.writeObject(bs);
>>>> byte[] bytes = bos.toByteArray();
>>>> oos.close();
>>>> out.writeInt(bytes.length);
>>>> out.write(bytes);
>>>>
>>>> }
>>>>
>>>> @Override
>>>> public void readFields(DataInput in) throws IOException {
>>>>
>>>> int len = in.readInt();
>>>> byte[] bytes = new byte[len];
>>>> in.readFully(bytes);
>>>>
>>>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>> ObjectInputStream ois = new ObjectInputStream(bis);
>>>> try {
>>>> bs = (BitSet) ois.readObject();
>>>> } catch (ClassNotFoundException e) {
>>>> throw new IOException(e);
>>>> }
>>>>
>>>> ois.close();
>>>> }
>>>>
>>>> }
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>
>
> --
> Bertrand Dechoux
>
Re: Wrapping around BitSet with the Writable interface
Posted by Jim Twensky <ji...@gmail.com>.
Thanks for the suggestions. I ended up switching to jdk 1.7+ just to make
the code more readable. I will take a look at the EWAH implementation as
well.
Jim
On Sun, May 12, 2013 at 3:40 PM, Bertrand Dechoux <de...@gmail.com>wrote:
> You can disregard my links as their are only valid for java 1.7+.
> The JavaSerialization might clean your code but shouldn't bring a
> significant boost in performance.
> The EWAH implementation has, at least, the methods you are looking for :
> serialize / deserialize.
>
> Regards
>
> Bertrand
>
> Note to myself : I have to remember this one.
>
>
> On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Another interesting alternative is the EWAH implementation of java
>> bitsets that allow efficient compressed bitsets with very fast OR
>> operations.
>>
>> https://github.com/lemire/javaewah
>>
>> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>>
>>
>> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> In order to make the code more readable, you could start by using the
>>> methods toByteArray() and valueOf(bytes)
>>>
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>>
>>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>>> MapReduce job. I decided to wrap around each object using the Writable
>>>> interface. Right now I convert each BitSet to a byte array and serialize
>>>> the byte array on disk.
>>>>
>>>> Converting them to byte arrays is a bit inefficient but I could not
>>>> find a work around to write them directly to the DataOutput. Is there a way
>>>> to skip this and serialize the object directly? Here is what my current
>>>> implementation looks like:
>>>>
>>>> public class BitSetWritable implements Writable {
>>>>
>>>> private BitSet bs;
>>>>
>>>> public BitSetWritable() {
>>>> this.bs = new BitSet();
>>>> }
>>>>
>>>> @Override
>>>> public void write(DataOutput out) throws IOException {
>>>>
>>>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>> ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>> oos.writeObject(bs);
>>>> byte[] bytes = bos.toByteArray();
>>>> oos.close();
>>>> out.writeInt(bytes.length);
>>>> out.write(bytes);
>>>>
>>>> }
>>>>
>>>> @Override
>>>> public void readFields(DataInput in) throws IOException {
>>>>
>>>> int len = in.readInt();
>>>> byte[] bytes = new byte[len];
>>>> in.readFully(bytes);
>>>>
>>>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>> ObjectInputStream ois = new ObjectInputStream(bis);
>>>> try {
>>>> bs = (BitSet) ois.readObject();
>>>> } catch (ClassNotFoundException e) {
>>>> throw new IOException(e);
>>>> }
>>>>
>>>> ois.close();
>>>> }
>>>>
>>>> }
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>
>
> --
> Bertrand Dechoux
>
Re: Wrapping around BitSet with the Writable interface
Posted by Jim Twensky <ji...@gmail.com>.
Thanks for the suggestions. I ended up switching to jdk 1.7+ just to make
the code more readable. I will take a look at the EWAH implementation as
well.
Jim
On Sun, May 12, 2013 at 3:40 PM, Bertrand Dechoux <de...@gmail.com>wrote:
> You can disregard my links as their are only valid for java 1.7+.
> The JavaSerialization might clean your code but shouldn't bring a
> significant boost in performance.
> The EWAH implementation has, at least, the methods you are looking for :
> serialize / deserialize.
>
> Regards
>
> Bertrand
>
> Note to myself : I have to remember this one.
>
>
> On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com>wrote:
>
>> Another interesting alternative is the EWAH implementation of java
>> bitsets that allow efficient compressed bitsets with very fast OR
>> operations.
>>
>> https://github.com/lemire/javaewah
>>
>> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>>
>>
>> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>>
>>> In order to make the code more readable, you could start by using the
>>> methods toByteArray() and valueOf(bytes)
>>>
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>>
>>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>>
>>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>>> MapReduce job. I decided to wrap around each object using the Writable
>>>> interface. Right now I convert each BitSet to a byte array and serialize
>>>> the byte array on disk.
>>>>
>>>> Converting them to byte arrays is a bit inefficient but I could not
>>>> find a work around to write them directly to the DataOutput. Is there a way
>>>> to skip this and serialize the object directly? Here is what my current
>>>> implementation looks like:
>>>>
>>>> public class BitSetWritable implements Writable {
>>>>
>>>> private BitSet bs;
>>>>
>>>> public BitSetWritable() {
>>>> this.bs = new BitSet();
>>>> }
>>>>
>>>> @Override
>>>> public void write(DataOutput out) throws IOException {
>>>>
>>>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>>> ObjectOutputStream oos = new ObjectOutputStream(bos);
>>>> oos.writeObject(bs);
>>>> byte[] bytes = bos.toByteArray();
>>>> oos.close();
>>>> out.writeInt(bytes.length);
>>>> out.write(bytes);
>>>>
>>>> }
>>>>
>>>> @Override
>>>> public void readFields(DataInput in) throws IOException {
>>>>
>>>> int len = in.readInt();
>>>> byte[] bytes = new byte[len];
>>>> in.readFully(bytes);
>>>>
>>>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>>> ObjectInputStream ois = new ObjectInputStream(bis);
>>>> try {
>>>> bs = (BitSet) ois.readObject();
>>>> } catch (ClassNotFoundException e) {
>>>> throw new IOException(e);
>>>> }
>>>>
>>>> ois.close();
>>>> }
>>>>
>>>> }
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>>
>
>
> --
> Bertrand Dechoux
>
Re: Wrapping around BitSet with the Writable interface
Posted by Bertrand Dechoux <de...@gmail.com>.
You can disregard my links as their are only valid for java 1.7+.
The JavaSerialization might clean your code but shouldn't bring a
significant boost in performance.
The EWAH implementation has, at least, the methods you are looking for :
serialize / deserialize.
Regards
Bertrand
Note to myself : I have to remember this one.
On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com> wrote:
> Another interesting alternative is the EWAH implementation of java bitsets
> that allow efficient compressed bitsets with very fast OR operations.
>
> https://github.com/lemire/javaewah
>
> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>
>
> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> In order to make the code more readable, you could start by using the
>> methods toByteArray() and valueOf(bytes)
>>
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>
>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>> MapReduce job. I decided to wrap around each object using the Writable
>>> interface. Right now I convert each BitSet to a byte array and serialize
>>> the byte array on disk.
>>>
>>> Converting them to byte arrays is a bit inefficient but I could not find
>>> a work around to write them directly to the DataOutput. Is there a way to
>>> skip this and serialize the object directly? Here is what my current
>>> implementation looks like:
>>>
>>> public class BitSetWritable implements Writable {
>>>
>>> private BitSet bs;
>>>
>>> public BitSetWritable() {
>>> this.bs = new BitSet();
>>> }
>>>
>>> @Override
>>> public void write(DataOutput out) throws IOException {
>>>
>>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>> ObjectOutputStream oos = new ObjectOutputStream(bos);
>>> oos.writeObject(bs);
>>> byte[] bytes = bos.toByteArray();
>>> oos.close();
>>> out.writeInt(bytes.length);
>>> out.write(bytes);
>>>
>>> }
>>>
>>> @Override
>>> public void readFields(DataInput in) throws IOException {
>>>
>>> int len = in.readInt();
>>> byte[] bytes = new byte[len];
>>> in.readFully(bytes);
>>>
>>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>> ObjectInputStream ois = new ObjectInputStream(bis);
>>> try {
>>> bs = (BitSet) ois.readObject();
>>> } catch (ClassNotFoundException e) {
>>> throw new IOException(e);
>>> }
>>>
>>> ois.close();
>>> }
>>>
>>> }
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>
--
Bertrand Dechoux
Re: Wrapping around BitSet with the Writable interface
Posted by Bertrand Dechoux <de...@gmail.com>.
You can disregard my links as their are only valid for java 1.7+.
The JavaSerialization might clean your code but shouldn't bring a
significant boost in performance.
The EWAH implementation has, at least, the methods you are looking for :
serialize / deserialize.
Regards
Bertrand
Note to myself : I have to remember this one.
On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com> wrote:
> Another interesting alternative is the EWAH implementation of java bitsets
> that allow efficient compressed bitsets with very fast OR operations.
>
> https://github.com/lemire/javaewah
>
> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>
>
> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> In order to make the code more readable, you could start by using the
>> methods toByteArray() and valueOf(bytes)
>>
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>
>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>> MapReduce job. I decided to wrap around each object using the Writable
>>> interface. Right now I convert each BitSet to a byte array and serialize
>>> the byte array on disk.
>>>
>>> Converting them to byte arrays is a bit inefficient but I could not find
>>> a work around to write them directly to the DataOutput. Is there a way to
>>> skip this and serialize the object directly? Here is what my current
>>> implementation looks like:
>>>
>>> public class BitSetWritable implements Writable {
>>>
>>> private BitSet bs;
>>>
>>> public BitSetWritable() {
>>> this.bs = new BitSet();
>>> }
>>>
>>> @Override
>>> public void write(DataOutput out) throws IOException {
>>>
>>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>> ObjectOutputStream oos = new ObjectOutputStream(bos);
>>> oos.writeObject(bs);
>>> byte[] bytes = bos.toByteArray();
>>> oos.close();
>>> out.writeInt(bytes.length);
>>> out.write(bytes);
>>>
>>> }
>>>
>>> @Override
>>> public void readFields(DataInput in) throws IOException {
>>>
>>> int len = in.readInt();
>>> byte[] bytes = new byte[len];
>>> in.readFully(bytes);
>>>
>>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>> ObjectInputStream ois = new ObjectInputStream(bis);
>>> try {
>>> bs = (BitSet) ois.readObject();
>>> } catch (ClassNotFoundException e) {
>>> throw new IOException(e);
>>> }
>>>
>>> ois.close();
>>> }
>>>
>>> }
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>
--
Bertrand Dechoux
Re: Wrapping around BitSet with the Writable interface
Posted by Bertrand Dechoux <de...@gmail.com>.
You can disregard my links as their are only valid for java 1.7+.
The JavaSerialization might clean your code but shouldn't bring a
significant boost in performance.
The EWAH implementation has, at least, the methods you are looking for :
serialize / deserialize.
Regards
Bertrand
Note to myself : I have to remember this one.
On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com> wrote:
> Another interesting alternative is the EWAH implementation of java bitsets
> that allow efficient compressed bitsets with very fast OR operations.
>
> https://github.com/lemire/javaewah
>
> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>
>
> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> In order to make the code more readable, you could start by using the
>> methods toByteArray() and valueOf(bytes)
>>
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>
>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>> MapReduce job. I decided to wrap around each object using the Writable
>>> interface. Right now I convert each BitSet to a byte array and serialize
>>> the byte array on disk.
>>>
>>> Converting them to byte arrays is a bit inefficient but I could not find
>>> a work around to write them directly to the DataOutput. Is there a way to
>>> skip this and serialize the object directly? Here is what my current
>>> implementation looks like:
>>>
>>> public class BitSetWritable implements Writable {
>>>
>>> private BitSet bs;
>>>
>>> public BitSetWritable() {
>>> this.bs = new BitSet();
>>> }
>>>
>>> @Override
>>> public void write(DataOutput out) throws IOException {
>>>
>>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>> ObjectOutputStream oos = new ObjectOutputStream(bos);
>>> oos.writeObject(bs);
>>> byte[] bytes = bos.toByteArray();
>>> oos.close();
>>> out.writeInt(bytes.length);
>>> out.write(bytes);
>>>
>>> }
>>>
>>> @Override
>>> public void readFields(DataInput in) throws IOException {
>>>
>>> int len = in.readInt();
>>> byte[] bytes = new byte[len];
>>> in.readFully(bytes);
>>>
>>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>> ObjectInputStream ois = new ObjectInputStream(bis);
>>> try {
>>> bs = (BitSet) ois.readObject();
>>> } catch (ClassNotFoundException e) {
>>> throw new IOException(e);
>>> }
>>>
>>> ois.close();
>>> }
>>>
>>> }
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>
--
Bertrand Dechoux
Re: Wrapping around BitSet with the Writable interface
Posted by Bertrand Dechoux <de...@gmail.com>.
You can disregard my links as their are only valid for java 1.7+.
The JavaSerialization might clean your code but shouldn't bring a
significant boost in performance.
The EWAH implementation has, at least, the methods you are looking for :
serialize / deserialize.
Regards
Bertrand
Note to myself : I have to remember this one.
On Sun, May 12, 2013 at 10:27 PM, Ted Dunning <td...@maprtech.com> wrote:
> Another interesting alternative is the EWAH implementation of java bitsets
> that allow efficient compressed bitsets with very fast OR operations.
>
> https://github.com/lemire/javaewah
>
> See also https://code.google.com/p/sparsebitmap/ by the same authors.
>
>
> On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
>
>> In order to make the code more readable, you could start by using the
>> methods toByteArray() and valueOf(bytes)
>>
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>>
>>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>>> MapReduce job. I decided to wrap around each object using the Writable
>>> interface. Right now I convert each BitSet to a byte array and serialize
>>> the byte array on disk.
>>>
>>> Converting them to byte arrays is a bit inefficient but I could not find
>>> a work around to write them directly to the DataOutput. Is there a way to
>>> skip this and serialize the object directly? Here is what my current
>>> implementation looks like:
>>>
>>> public class BitSetWritable implements Writable {
>>>
>>> private BitSet bs;
>>>
>>> public BitSetWritable() {
>>> this.bs = new BitSet();
>>> }
>>>
>>> @Override
>>> public void write(DataOutput out) throws IOException {
>>>
>>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>>> ObjectOutputStream oos = new ObjectOutputStream(bos);
>>> oos.writeObject(bs);
>>> byte[] bytes = bos.toByteArray();
>>> oos.close();
>>> out.writeInt(bytes.length);
>>> out.write(bytes);
>>>
>>> }
>>>
>>> @Override
>>> public void readFields(DataInput in) throws IOException {
>>>
>>> int len = in.readInt();
>>> byte[] bytes = new byte[len];
>>> in.readFully(bytes);
>>>
>>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>>> ObjectInputStream ois = new ObjectInputStream(bis);
>>> try {
>>> bs = (BitSet) ois.readObject();
>>> } catch (ClassNotFoundException e) {
>>> throw new IOException(e);
>>> }
>>>
>>> ois.close();
>>> }
>>>
>>> }
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>
>
--
Bertrand Dechoux
Re: Wrapping around BitSet with the Writable interface
Posted by Ted Dunning <td...@maprtech.com>.
Another interesting alternative is the EWAH implementation of java bitsets
that allow efficient compressed bitsets with very fast OR operations.
https://github.com/lemire/javaewah
See also https://code.google.com/p/sparsebitmap/ by the same authors.
On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
> In order to make the code more readable, you could start by using the
> methods toByteArray() and valueOf(bytes)
>
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>
> Regards
>
> Bertrand
>
>
> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>
>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>> MapReduce job. I decided to wrap around each object using the Writable
>> interface. Right now I convert each BitSet to a byte array and serialize
>> the byte array on disk.
>>
>> Converting them to byte arrays is a bit inefficient but I could not find
>> a work around to write them directly to the DataOutput. Is there a way to
>> skip this and serialize the object directly? Here is what my current
>> implementation looks like:
>>
>> public class BitSetWritable implements Writable {
>>
>> private BitSet bs;
>>
>> public BitSetWritable() {
>> this.bs = new BitSet();
>> }
>>
>> @Override
>> public void write(DataOutput out) throws IOException {
>>
>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>> ObjectOutputStream oos = new ObjectOutputStream(bos);
>> oos.writeObject(bs);
>> byte[] bytes = bos.toByteArray();
>> oos.close();
>> out.writeInt(bytes.length);
>> out.write(bytes);
>>
>> }
>>
>> @Override
>> public void readFields(DataInput in) throws IOException {
>>
>> int len = in.readInt();
>> byte[] bytes = new byte[len];
>> in.readFully(bytes);
>>
>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>> ObjectInputStream ois = new ObjectInputStream(bis);
>> try {
>> bs = (BitSet) ois.readObject();
>> } catch (ClassNotFoundException e) {
>> throw new IOException(e);
>> }
>>
>> ois.close();
>> }
>>
>> }
>>
>
>
>
> --
> Bertrand Dechoux
>
Re: Wrapping around BitSet with the Writable interface
Posted by Ted Dunning <td...@maprtech.com>.
Another interesting alternative is the EWAH implementation of java bitsets
that allow efficient compressed bitsets with very fast OR operations.
https://github.com/lemire/javaewah
See also https://code.google.com/p/sparsebitmap/ by the same authors.
On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
> In order to make the code more readable, you could start by using the
> methods toByteArray() and valueOf(bytes)
>
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>
> Regards
>
> Bertrand
>
>
> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>
>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>> MapReduce job. I decided to wrap around each object using the Writable
>> interface. Right now I convert each BitSet to a byte array and serialize
>> the byte array on disk.
>>
>> Converting them to byte arrays is a bit inefficient but I could not find
>> a work around to write them directly to the DataOutput. Is there a way to
>> skip this and serialize the object directly? Here is what my current
>> implementation looks like:
>>
>> public class BitSetWritable implements Writable {
>>
>> private BitSet bs;
>>
>> public BitSetWritable() {
>> this.bs = new BitSet();
>> }
>>
>> @Override
>> public void write(DataOutput out) throws IOException {
>>
>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>> ObjectOutputStream oos = new ObjectOutputStream(bos);
>> oos.writeObject(bs);
>> byte[] bytes = bos.toByteArray();
>> oos.close();
>> out.writeInt(bytes.length);
>> out.write(bytes);
>>
>> }
>>
>> @Override
>> public void readFields(DataInput in) throws IOException {
>>
>> int len = in.readInt();
>> byte[] bytes = new byte[len];
>> in.readFully(bytes);
>>
>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>> ObjectInputStream ois = new ObjectInputStream(bis);
>> try {
>> bs = (BitSet) ois.readObject();
>> } catch (ClassNotFoundException e) {
>> throw new IOException(e);
>> }
>>
>> ois.close();
>> }
>>
>> }
>>
>
>
>
> --
> Bertrand Dechoux
>
Re: Wrapping around BitSet with the Writable interface
Posted by Ted Dunning <td...@maprtech.com>.
Another interesting alternative is the EWAH implementation of java bitsets
that allow efficient compressed bitsets with very fast OR operations.
https://github.com/lemire/javaewah
See also https://code.google.com/p/sparsebitmap/ by the same authors.
On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
> In order to make the code more readable, you could start by using the
> methods toByteArray() and valueOf(bytes)
>
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>
> Regards
>
> Bertrand
>
>
> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>
>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>> MapReduce job. I decided to wrap around each object using the Writable
>> interface. Right now I convert each BitSet to a byte array and serialize
>> the byte array on disk.
>>
>> Converting them to byte arrays is a bit inefficient but I could not find
>> a work around to write them directly to the DataOutput. Is there a way to
>> skip this and serialize the object directly? Here is what my current
>> implementation looks like:
>>
>> public class BitSetWritable implements Writable {
>>
>> private BitSet bs;
>>
>> public BitSetWritable() {
>> this.bs = new BitSet();
>> }
>>
>> @Override
>> public void write(DataOutput out) throws IOException {
>>
>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>> ObjectOutputStream oos = new ObjectOutputStream(bos);
>> oos.writeObject(bs);
>> byte[] bytes = bos.toByteArray();
>> oos.close();
>> out.writeInt(bytes.length);
>> out.write(bytes);
>>
>> }
>>
>> @Override
>> public void readFields(DataInput in) throws IOException {
>>
>> int len = in.readInt();
>> byte[] bytes = new byte[len];
>> in.readFully(bytes);
>>
>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>> ObjectInputStream ois = new ObjectInputStream(bis);
>> try {
>> bs = (BitSet) ois.readObject();
>> } catch (ClassNotFoundException e) {
>> throw new IOException(e);
>> }
>>
>> ois.close();
>> }
>>
>> }
>>
>
>
>
> --
> Bertrand Dechoux
>
Re: Wrapping around BitSet with the Writable interface
Posted by Ted Dunning <td...@maprtech.com>.
Another interesting alternative is the EWAH implementation of java bitsets
that allow efficient compressed bitsets with very fast OR operations.
https://github.com/lemire/javaewah
See also https://code.google.com/p/sparsebitmap/ by the same authors.
On Sun, May 12, 2013 at 1:11 PM, Bertrand Dechoux <de...@gmail.com>wrote:
> In order to make the code more readable, you could start by using the
> methods toByteArray() and valueOf(bytes)
>
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
>
> http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
>
> Regards
>
> Bertrand
>
>
> On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com>wrote:
>
>> I have large java.util.BitSet objects that I want to bitwise-OR using a
>> MapReduce job. I decided to wrap around each object using the Writable
>> interface. Right now I convert each BitSet to a byte array and serialize
>> the byte array on disk.
>>
>> Converting them to byte arrays is a bit inefficient but I could not find
>> a work around to write them directly to the DataOutput. Is there a way to
>> skip this and serialize the object directly? Here is what my current
>> implementation looks like:
>>
>> public class BitSetWritable implements Writable {
>>
>> private BitSet bs;
>>
>> public BitSetWritable() {
>> this.bs = new BitSet();
>> }
>>
>> @Override
>> public void write(DataOutput out) throws IOException {
>>
>> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
>> ObjectOutputStream oos = new ObjectOutputStream(bos);
>> oos.writeObject(bs);
>> byte[] bytes = bos.toByteArray();
>> oos.close();
>> out.writeInt(bytes.length);
>> out.write(bytes);
>>
>> }
>>
>> @Override
>> public void readFields(DataInput in) throws IOException {
>>
>> int len = in.readInt();
>> byte[] bytes = new byte[len];
>> in.readFully(bytes);
>>
>> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
>> ObjectInputStream ois = new ObjectInputStream(bis);
>> try {
>> bs = (BitSet) ois.readObject();
>> } catch (ClassNotFoundException e) {
>> throw new IOException(e);
>> }
>>
>> ois.close();
>> }
>>
>> }
>>
>
>
>
> --
> Bertrand Dechoux
>
Re: Wrapping around BitSet with the Writable interface
Posted by Bertrand Dechoux <de...@gmail.com>.
In order to make the code more readable, you could start by using the
methods toByteArray() and valueOf(bytes)
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
Regards
Bertrand
On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize
> the byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to
> skip this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
> private BitSet bs;
>
> public BitSetWritable() {
> this.bs = new BitSet();
> }
>
> @Override
> public void write(DataOutput out) throws IOException {
>
> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
> ObjectOutputStream oos = new ObjectOutputStream(bos);
> oos.writeObject(bs);
> byte[] bytes = bos.toByteArray();
> oos.close();
> out.writeInt(bytes.length);
> out.write(bytes);
>
> }
>
> @Override
> public void readFields(DataInput in) throws IOException {
>
> int len = in.readInt();
> byte[] bytes = new byte[len];
> in.readFully(bytes);
>
> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
> ObjectInputStream ois = new ObjectInputStream(bis);
> try {
> bs = (BitSet) ois.readObject();
> } catch (ClassNotFoundException e) {
> throw new IOException(e);
> }
>
> ois.close();
> }
>
> }
>
--
Bertrand Dechoux
Re: Wrapping around BitSet with the Writable interface
Posted by Bertrand Dechoux <de...@gmail.com>.
In order to make the code more readable, you could start by using the
methods toByteArray() and valueOf(bytes)
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
Regards
Bertrand
On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize
> the byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to
> skip this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
> private BitSet bs;
>
> public BitSetWritable() {
> this.bs = new BitSet();
> }
>
> @Override
> public void write(DataOutput out) throws IOException {
>
> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
> ObjectOutputStream oos = new ObjectOutputStream(bos);
> oos.writeObject(bs);
> byte[] bytes = bos.toByteArray();
> oos.close();
> out.writeInt(bytes.length);
> out.write(bytes);
>
> }
>
> @Override
> public void readFields(DataInput in) throws IOException {
>
> int len = in.readInt();
> byte[] bytes = new byte[len];
> in.readFully(bytes);
>
> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
> ObjectInputStream ois = new ObjectInputStream(bis);
> try {
> bs = (BitSet) ois.readObject();
> } catch (ClassNotFoundException e) {
> throw new IOException(e);
> }
>
> ois.close();
> }
>
> }
>
--
Bertrand Dechoux
Re: Wrapping around BitSet with the Writable interface
Posted by Harsh J <ha...@cloudera.com>.
You can perhaps consider using the experimental JavaSerialization [1]
enhancement to skip transforming to
Writables/other-serialization-formats. It may be slower but looks like
you are looking for a way to avoid transforming objects.
Enable by adding the class
org.apache.hadoop.io.serializer.JavaSerialization to the list of
io.serializations like so in your client configuration:
<property>
<name>io.serializations</name>
<value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization,org.apache.hadoop.io.serializer.JavaSerialization</value>
</property>
And you should then be able to rely on Java's inbuilt serialization to
directly serialize your BitSet object?
[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/serializer/JavaSerialization.html
On Sun, May 12, 2013 at 11:54 PM, Jim Twensky <ji...@gmail.com> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize the
> byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to skip
> this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
> private BitSet bs;
>
> public BitSetWritable() {
> this.bs = new BitSet();
> }
>
> @Override
> public void write(DataOutput out) throws IOException {
>
> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
> ObjectOutputStream oos = new ObjectOutputStream(bos);
> oos.writeObject(bs);
> byte[] bytes = bos.toByteArray();
> oos.close();
> out.writeInt(bytes.length);
> out.write(bytes);
>
> }
>
> @Override
> public void readFields(DataInput in) throws IOException {
>
> int len = in.readInt();
> byte[] bytes = new byte[len];
> in.readFully(bytes);
>
> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
> ObjectInputStream ois = new ObjectInputStream(bis);
> try {
> bs = (BitSet) ois.readObject();
> } catch (ClassNotFoundException e) {
> throw new IOException(e);
> }
>
> ois.close();
> }
>
> }
--
Harsh J
Re: Wrapping around BitSet with the Writable interface
Posted by Harsh J <ha...@cloudera.com>.
You can perhaps consider using the experimental JavaSerialization [1]
enhancement to skip transforming to
Writables/other-serialization-formats. It may be slower but looks like
you are looking for a way to avoid transforming objects.
Enable by adding the class
org.apache.hadoop.io.serializer.JavaSerialization to the list of
io.serializations like so in your client configuration:
<property>
<name>io.serializations</name>
<value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization,org.apache.hadoop.io.serializer.JavaSerialization</value>
</property>
And you should then be able to rely on Java's inbuilt serialization to
directly serialize your BitSet object?
[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/serializer/JavaSerialization.html
On Sun, May 12, 2013 at 11:54 PM, Jim Twensky <ji...@gmail.com> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize the
> byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to skip
> this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
> private BitSet bs;
>
> public BitSetWritable() {
> this.bs = new BitSet();
> }
>
> @Override
> public void write(DataOutput out) throws IOException {
>
> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
> ObjectOutputStream oos = new ObjectOutputStream(bos);
> oos.writeObject(bs);
> byte[] bytes = bos.toByteArray();
> oos.close();
> out.writeInt(bytes.length);
> out.write(bytes);
>
> }
>
> @Override
> public void readFields(DataInput in) throws IOException {
>
> int len = in.readInt();
> byte[] bytes = new byte[len];
> in.readFully(bytes);
>
> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
> ObjectInputStream ois = new ObjectInputStream(bis);
> try {
> bs = (BitSet) ois.readObject();
> } catch (ClassNotFoundException e) {
> throw new IOException(e);
> }
>
> ois.close();
> }
>
> }
--
Harsh J
Re: Wrapping around BitSet with the Writable interface
Posted by Bertrand Dechoux <de...@gmail.com>.
In order to make the code more readable, you could start by using the
methods toByteArray() and valueOf(bytes)
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
Regards
Bertrand
On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize
> the byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to
> skip this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
> private BitSet bs;
>
> public BitSetWritable() {
> this.bs = new BitSet();
> }
>
> @Override
> public void write(DataOutput out) throws IOException {
>
> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
> ObjectOutputStream oos = new ObjectOutputStream(bos);
> oos.writeObject(bs);
> byte[] bytes = bos.toByteArray();
> oos.close();
> out.writeInt(bytes.length);
> out.write(bytes);
>
> }
>
> @Override
> public void readFields(DataInput in) throws IOException {
>
> int len = in.readInt();
> byte[] bytes = new byte[len];
> in.readFully(bytes);
>
> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
> ObjectInputStream ois = new ObjectInputStream(bis);
> try {
> bs = (BitSet) ois.readObject();
> } catch (ClassNotFoundException e) {
> throw new IOException(e);
> }
>
> ois.close();
> }
>
> }
>
--
Bertrand Dechoux
Re: Wrapping around BitSet with the Writable interface
Posted by Harsh J <ha...@cloudera.com>.
You can perhaps consider using the experimental JavaSerialization [1]
enhancement to skip transforming to
Writables/other-serialization-formats. It may be slower but looks like
you are looking for a way to avoid transforming objects.
Enable by adding the class
org.apache.hadoop.io.serializer.JavaSerialization to the list of
io.serializations like so in your client configuration:
<property>
<name>io.serializations</name>
<value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization,org.apache.hadoop.io.serializer.JavaSerialization</value>
</property>
And you should then be able to rely on Java's inbuilt serialization to
directly serialize your BitSet object?
[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/serializer/JavaSerialization.html
On Sun, May 12, 2013 at 11:54 PM, Jim Twensky <ji...@gmail.com> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize the
> byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to skip
> this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
> private BitSet bs;
>
> public BitSetWritable() {
> this.bs = new BitSet();
> }
>
> @Override
> public void write(DataOutput out) throws IOException {
>
> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
> ObjectOutputStream oos = new ObjectOutputStream(bos);
> oos.writeObject(bs);
> byte[] bytes = bos.toByteArray();
> oos.close();
> out.writeInt(bytes.length);
> out.write(bytes);
>
> }
>
> @Override
> public void readFields(DataInput in) throws IOException {
>
> int len = in.readInt();
> byte[] bytes = new byte[len];
> in.readFully(bytes);
>
> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
> ObjectInputStream ois = new ObjectInputStream(bis);
> try {
> bs = (BitSet) ois.readObject();
> } catch (ClassNotFoundException e) {
> throw new IOException(e);
> }
>
> ois.close();
> }
>
> }
--
Harsh J
Re: Wrapping around BitSet with the Writable interface
Posted by Harsh J <ha...@cloudera.com>.
You can perhaps consider using the experimental JavaSerialization [1]
enhancement to skip transforming to
Writables/other-serialization-formats. It may be slower but looks like
you are looking for a way to avoid transforming objects.
Enable by adding the class
org.apache.hadoop.io.serializer.JavaSerialization to the list of
io.serializations like so in your client configuration:
<property>
<name>io.serializations</name>
<value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization,org.apache.hadoop.io.serializer.JavaSerialization</value>
</property>
And you should then be able to rely on Java's inbuilt serialization to
directly serialize your BitSet object?
[1] - http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/serializer/JavaSerialization.html
On Sun, May 12, 2013 at 11:54 PM, Jim Twensky <ji...@gmail.com> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize the
> byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to skip
> this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
> private BitSet bs;
>
> public BitSetWritable() {
> this.bs = new BitSet();
> }
>
> @Override
> public void write(DataOutput out) throws IOException {
>
> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
> ObjectOutputStream oos = new ObjectOutputStream(bos);
> oos.writeObject(bs);
> byte[] bytes = bos.toByteArray();
> oos.close();
> out.writeInt(bytes.length);
> out.write(bytes);
>
> }
>
> @Override
> public void readFields(DataInput in) throws IOException {
>
> int len = in.readInt();
> byte[] bytes = new byte[len];
> in.readFully(bytes);
>
> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
> ObjectInputStream ois = new ObjectInputStream(bis);
> try {
> bs = (BitSet) ois.readObject();
> } catch (ClassNotFoundException e) {
> throw new IOException(e);
> }
>
> ois.close();
> }
>
> }
--
Harsh J
Re: Wrapping around BitSet with the Writable interface
Posted by Bertrand Dechoux <de...@gmail.com>.
In order to make the code more readable, you could start by using the
methods toByteArray() and valueOf(bytes)
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29
http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29
Regards
Bertrand
On Sun, May 12, 2013 at 8:24 PM, Jim Twensky <ji...@gmail.com> wrote:
> I have large java.util.BitSet objects that I want to bitwise-OR using a
> MapReduce job. I decided to wrap around each object using the Writable
> interface. Right now I convert each BitSet to a byte array and serialize
> the byte array on disk.
>
> Converting them to byte arrays is a bit inefficient but I could not find a
> work around to write them directly to the DataOutput. Is there a way to
> skip this and serialize the object directly? Here is what my current
> implementation looks like:
>
> public class BitSetWritable implements Writable {
>
> private BitSet bs;
>
> public BitSetWritable() {
> this.bs = new BitSet();
> }
>
> @Override
> public void write(DataOutput out) throws IOException {
>
> ByteArrayOutputStream bos = new ByteArrayOutputStream(bs.size()/8);
> ObjectOutputStream oos = new ObjectOutputStream(bos);
> oos.writeObject(bs);
> byte[] bytes = bos.toByteArray();
> oos.close();
> out.writeInt(bytes.length);
> out.write(bytes);
>
> }
>
> @Override
> public void readFields(DataInput in) throws IOException {
>
> int len = in.readInt();
> byte[] bytes = new byte[len];
> in.readFully(bytes);
>
> ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
> ObjectInputStream ois = new ObjectInputStream(bis);
> try {
> bs = (BitSet) ois.readObject();
> } catch (ClassNotFoundException e) {
> throw new IOException(e);
> }
>
> ois.close();
> }
>
> }
>
--
Bertrand Dechoux