You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Libing Sun (JIRA)" <ji...@apache.org> on 2012/11/08 08:32:15 UTC

[jira] [Created] (AVRO-1199) SortedKeyValueFile$Writer.append method have a puzzle for the sorted key

Libing Sun created AVRO-1199:
--------------------------------

             Summary: SortedKeyValueFile$Writer.append method have a puzzle for the sorted key
                 Key: AVRO-1199
                 URL: https://issues.apache.org/jira/browse/AVRO-1199
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.7.3
            Reporter: Libing Sun


At the SortedKeyValueFile.java 539 lines like next:
mPreviousKey = key;

This class is same as Hadoop's MapFile, at the MapFile the same methon will keep a copy for this key, but not use "=".

If use "=" at here, when user append a reuse key object to this file. will cause key sorted not valid.

I think next code will fix it:

private DataInputBuffer inBuf = new DataInputBuffer();
private DataOutputBuffer outBuf = new DataOutputBuffer();

      GenericDatumWriter<K> writer = new ReflectDatumWriter<K>(schema);
      GenericDatumReader<K> reader = new ReflectDatumReader<K>(schema);
      writer.write(key, encoder);
      inBuf.reset(outBuf.getData(), outBuf.getLength());
      BinaryDecoder decoder = DecoderFactory.get().directBinaryDecoder(inBuf,
          null);
      lastKey = reader.read(null, decoder); 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1199) SortedKeyValueFile$Writer.append method have a puzzle for the sorted key

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated AVRO-1199:
-------------------------------

    Attachment: AVRO-1199.patch

Here's a new version of the patch.

I changed deepCopy() to be a generic method, so that calls don't have to cast the result and updated existing calls to no longer cast the result. 

I'll commit this version soon unless someone objects.
                
> SortedKeyValueFile$Writer.append method have a puzzle for the sorted key
> ------------------------------------------------------------------------
>
>                 Key: AVRO-1199
>                 URL: https://issues.apache.org/jira/browse/AVRO-1199
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.3
>            Reporter: Libing Sun
>              Labels: patch
>             Fix For: 1.7.3
>
>         Attachments: AVRO-1199.patch, AVRO-1199.path
>
>
> At the SortedKeyValueFile.java 539 lines like next:
> mPreviousKey = key;
> This class is same as Hadoop's MapFile, at the MapFile the same methon will keep a copy for this key, but not use "=".
> If use "=" at here, when user append a reuse key object to this file. will cause key sorted not valid.
> I think next code will fix it:
> private DataInputBuffer inBuf = new DataInputBuffer();
> private DataOutputBuffer outBuf = new DataOutputBuffer();
>       GenericDatumWriter<K> writer = new ReflectDatumWriter<K>(schema);
>       GenericDatumReader<K> reader = new ReflectDatumReader<K>(schema);
>       writer.write(key, encoder);
>       inBuf.reset(outBuf.getData(), outBuf.getLength());
>       BinaryDecoder decoder = DecoderFactory.get().directBinaryDecoder(inBuf,
>           null);
>       lastKey = reader.read(null, decoder); 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1199) SortedKeyValueFile$Writer.append method have a puzzle for the sorted key

Posted by "Libing Sun (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493714#comment-13493714 ] 

Libing Sun commented on AVRO-1199:
----------------------------------

Why not use the serialize and deserialize, How about the performance of the two way?
I use the avro in M/R for years, I allways use a method likes next to clone a object:

static DataOutputBuffer out = new DataOutputBuffer();
static DataInputBuffer in = new DataInputBuffer();
public static <V> V clone(V v) throws IOException {
    out.reset();
    BinaryEncoder encoder = EncoderFactory.get().directBinaryEncoder(out, null);
    Schema schema;
    if (GenericContainer.class.isAssignableFrom(v.getClass())) schema = ((GenericContainer) v)
        .getSchema();
    else schema = ReflectData.get().getSchema(v.getClass());
    
    GenericDatumWriter<V> writer = new ReflectDatumWriter<V>(schema);
    GenericDatumReader<V> reader = new ReflectDatumReader<V>(schema);
    writer.write(v, encoder);
    in.reset(out.getData(), out.getLength());
    BinaryDecoder decoder = DecoderFactory.get().directBinaryDecoder(in, null);
    return reader.read(null, decoder);
}

Another question, I don't know it's appropriate to say it here or not, if it's inappropriate, I am sorry.
Hadoop's MapFile use a arry to store the index, but here use a TreeMap, why ? how about the time and space effciency?
I ask it just for the hadoop project is very closely with avor.
                
> SortedKeyValueFile$Writer.append method have a puzzle for the sorted key
> ------------------------------------------------------------------------
>
>                 Key: AVRO-1199
>                 URL: https://issues.apache.org/jira/browse/AVRO-1199
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.3
>            Reporter: Libing Sun
>              Labels: patch
>
> At the SortedKeyValueFile.java 539 lines like next:
> mPreviousKey = key;
> This class is same as Hadoop's MapFile, at the MapFile the same methon will keep a copy for this key, but not use "=".
> If use "=" at here, when user append a reuse key object to this file. will cause key sorted not valid.
> I think next code will fix it:
> private DataInputBuffer inBuf = new DataInputBuffer();
> private DataOutputBuffer outBuf = new DataOutputBuffer();
>       GenericDatumWriter<K> writer = new ReflectDatumWriter<K>(schema);
>       GenericDatumReader<K> reader = new ReflectDatumReader<K>(schema);
>       writer.write(key, encoder);
>       inBuf.reset(outBuf.getData(), outBuf.getLength());
>       BinaryDecoder decoder = DecoderFactory.get().directBinaryDecoder(inBuf,
>           null);
>       lastKey = reader.read(null, decoder); 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1199) SortedKeyValueFile$Writer.append method have a puzzle for the sorted key

Posted by "Libing Sun (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Libing Sun updated AVRO-1199:
-----------------------------

    Fix Version/s: 1.7.3
     Hadoop Flags: Reviewed
           Status: Patch Available  (was: Open)

I submit a path ad this.
                
> SortedKeyValueFile$Writer.append method have a puzzle for the sorted key
> ------------------------------------------------------------------------
>
>                 Key: AVRO-1199
>                 URL: https://issues.apache.org/jira/browse/AVRO-1199
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.3
>            Reporter: Libing Sun
>              Labels: patch
>             Fix For: 1.7.3
>
>
> At the SortedKeyValueFile.java 539 lines like next:
> mPreviousKey = key;
> This class is same as Hadoop's MapFile, at the MapFile the same methon will keep a copy for this key, but not use "=".
> If use "=" at here, when user append a reuse key object to this file. will cause key sorted not valid.
> I think next code will fix it:
> private DataInputBuffer inBuf = new DataInputBuffer();
> private DataOutputBuffer outBuf = new DataOutputBuffer();
>       GenericDatumWriter<K> writer = new ReflectDatumWriter<K>(schema);
>       GenericDatumReader<K> reader = new ReflectDatumReader<K>(schema);
>       writer.write(key, encoder);
>       inBuf.reset(outBuf.getData(), outBuf.getLength());
>       BinaryDecoder decoder = DecoderFactory.get().directBinaryDecoder(inBuf,
>           null);
>       lastKey = reader.read(null, decoder); 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1199) SortedKeyValueFile$Writer.append method have a puzzle for the sorted key

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493314#comment-13493314 ] 

Doug Cutting commented on AVRO-1199:
------------------------------------

Yes, this looks like a bug that we should fix.

A simpler way to fix it might be to change this to:

mPreviousKey = GenericData.get().deepCopy(mKeySchema, key);

Would you like to prepare a patch for this?

https://cwiki.apache.org/AVRO/how-to-contribute.html
                
> SortedKeyValueFile$Writer.append method have a puzzle for the sorted key
> ------------------------------------------------------------------------
>
>                 Key: AVRO-1199
>                 URL: https://issues.apache.org/jira/browse/AVRO-1199
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.3
>            Reporter: Libing Sun
>              Labels: patch
>
> At the SortedKeyValueFile.java 539 lines like next:
> mPreviousKey = key;
> This class is same as Hadoop's MapFile, at the MapFile the same methon will keep a copy for this key, but not use "=".
> If use "=" at here, when user append a reuse key object to this file. will cause key sorted not valid.
> I think next code will fix it:
> private DataInputBuffer inBuf = new DataInputBuffer();
> private DataOutputBuffer outBuf = new DataOutputBuffer();
>       GenericDatumWriter<K> writer = new ReflectDatumWriter<K>(schema);
>       GenericDatumReader<K> reader = new ReflectDatumReader<K>(schema);
>       writer.write(key, encoder);
>       inBuf.reset(outBuf.getData(), outBuf.getLength());
>       BinaryDecoder decoder = DecoderFactory.get().directBinaryDecoder(inBuf,
>           null);
>       lastKey = reader.read(null, decoder); 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (AVRO-1199) SortedKeyValueFile$Writer.append method have a puzzle for the sorted key

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting resolved AVRO-1199.
--------------------------------

    Resolution: Fixed
      Assignee: Doug Cutting

I added a test and committed this.
                
> SortedKeyValueFile$Writer.append method have a puzzle for the sorted key
> ------------------------------------------------------------------------
>
>                 Key: AVRO-1199
>                 URL: https://issues.apache.org/jira/browse/AVRO-1199
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.3
>            Reporter: Libing Sun
>            Assignee: Doug Cutting
>              Labels: patch
>             Fix For: 1.7.3
>
>         Attachments: AVRO-1199.patch, AVRO-1199.path
>
>
> At the SortedKeyValueFile.java 539 lines like next:
> mPreviousKey = key;
> This class is same as Hadoop's MapFile, at the MapFile the same methon will keep a copy for this key, but not use "=".
> If use "=" at here, when user append a reuse key object to this file. will cause key sorted not valid.
> I think next code will fix it:
> private DataInputBuffer inBuf = new DataInputBuffer();
> private DataOutputBuffer outBuf = new DataOutputBuffer();
>       GenericDatumWriter<K> writer = new ReflectDatumWriter<K>(schema);
>       GenericDatumReader<K> reader = new ReflectDatumReader<K>(schema);
>       writer.write(key, encoder);
>       inBuf.reset(outBuf.getData(), outBuf.getLength());
>       BinaryDecoder decoder = DecoderFactory.get().directBinaryDecoder(inBuf,
>           null);
>       lastKey = reader.read(null, decoder); 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1199) SortedKeyValueFile$Writer.append method have a puzzle for the sorted key

Posted by "Libing Sun (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Libing Sun updated AVRO-1199:
-----------------------------

    Attachment: AVRO-1199.path

A path to fix bug AVRO-1199
                
> SortedKeyValueFile$Writer.append method have a puzzle for the sorted key
> ------------------------------------------------------------------------
>
>                 Key: AVRO-1199
>                 URL: https://issues.apache.org/jira/browse/AVRO-1199
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.3
>            Reporter: Libing Sun
>              Labels: patch
>             Fix For: 1.7.3
>
>         Attachments: AVRO-1199.path
>
>
> At the SortedKeyValueFile.java 539 lines like next:
> mPreviousKey = key;
> This class is same as Hadoop's MapFile, at the MapFile the same methon will keep a copy for this key, but not use "=".
> If use "=" at here, when user append a reuse key object to this file. will cause key sorted not valid.
> I think next code will fix it:
> private DataInputBuffer inBuf = new DataInputBuffer();
> private DataOutputBuffer outBuf = new DataOutputBuffer();
>       GenericDatumWriter<K> writer = new ReflectDatumWriter<K>(schema);
>       GenericDatumReader<K> reader = new ReflectDatumReader<K>(schema);
>       writer.write(key, encoder);
>       inBuf.reset(outBuf.getData(), outBuf.getLength());
>       BinaryDecoder decoder = DecoderFactory.get().directBinaryDecoder(inBuf,
>           null);
>       lastKey = reader.read(null, decoder); 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1199) SortedKeyValueFile$Writer.append method have a puzzle for the sorted key

Posted by "Libing Sun (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AVRO-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Libing Sun updated AVRO-1199:
-----------------------------

    Status: Open  (was: Patch Available)

No path Attachments.
                
> SortedKeyValueFile$Writer.append method have a puzzle for the sorted key
> ------------------------------------------------------------------------
>
>                 Key: AVRO-1199
>                 URL: https://issues.apache.org/jira/browse/AVRO-1199
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.3
>            Reporter: Libing Sun
>              Labels: patch
>             Fix For: 1.7.3
>
>
> At the SortedKeyValueFile.java 539 lines like next:
> mPreviousKey = key;
> This class is same as Hadoop's MapFile, at the MapFile the same methon will keep a copy for this key, but not use "=".
> If use "=" at here, when user append a reuse key object to this file. will cause key sorted not valid.
> I think next code will fix it:
> private DataInputBuffer inBuf = new DataInputBuffer();
> private DataOutputBuffer outBuf = new DataOutputBuffer();
>       GenericDatumWriter<K> writer = new ReflectDatumWriter<K>(schema);
>       GenericDatumReader<K> reader = new ReflectDatumReader<K>(schema);
>       writer.write(key, encoder);
>       inBuf.reset(outBuf.getData(), outBuf.getLength());
>       BinaryDecoder decoder = DecoderFactory.get().directBinaryDecoder(inBuf,
>           null);
>       lastKey = reader.read(null, decoder); 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1199) SortedKeyValueFile$Writer.append method have a puzzle for the sorted key

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AVRO-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494156#comment-13494156 ] 

Doug Cutting commented on AVRO-1199:
------------------------------------

> Why not use the serialize and deserialize

Using the deepCopy() method is simpler and should be more efficient.  It didn't used to exist, so one had to serialize and deserialize in order to clone an object.
                
> SortedKeyValueFile$Writer.append method have a puzzle for the sorted key
> ------------------------------------------------------------------------
>
>                 Key: AVRO-1199
>                 URL: https://issues.apache.org/jira/browse/AVRO-1199
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.3
>            Reporter: Libing Sun
>              Labels: patch
>
> At the SortedKeyValueFile.java 539 lines like next:
> mPreviousKey = key;
> This class is same as Hadoop's MapFile, at the MapFile the same methon will keep a copy for this key, but not use "=".
> If use "=" at here, when user append a reuse key object to this file. will cause key sorted not valid.
> I think next code will fix it:
> private DataInputBuffer inBuf = new DataInputBuffer();
> private DataOutputBuffer outBuf = new DataOutputBuffer();
>       GenericDatumWriter<K> writer = new ReflectDatumWriter<K>(schema);
>       GenericDatumReader<K> reader = new ReflectDatumReader<K>(schema);
>       writer.write(key, encoder);
>       inBuf.reset(outBuf.getData(), outBuf.getLength());
>       BinaryDecoder decoder = DecoderFactory.get().directBinaryDecoder(inBuf,
>           null);
>       lastKey = reader.read(null, decoder); 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira