You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Wang Ken (JIRA)" <ji...@apache.org> on 2016/08/25 10:28:20 UTC

[jira] [Commented] (KYLIN-1723) GTAggregateScanner$Dump.flush() must not write the WHOLE metrics buffer

    [ https://issues.apache.org/jira/browse/KYLIN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436636#comment-15436636 ] 

Wang Ken commented on KYLIN-1723:
---------------------------------

The original implementation in Dump.flush() is actually wrong. It is a critical bug for versions prior to Kylin 1.5.3. It will cause the Hbase coprocessor return incorrect results
if spill to disk happen. And fast cube building result could be wrong also if spill to disk happens. 
This is because in java serialization, if we call writeObject multiple times for the same object instance, the serialized stream will just hold one reference the to same instance to the same  serialized instance.

https://docs.oracle.com/javase/7/docs/platform/serialization/spec/output.html

 /**
     * Writes an "unshared" object to the ObjectOutputStream.  This method is
     * identical to writeObject, except that it always writes the given object
     * as a new, unique object in the stream (as opposed to a back-reference
     * pointing to a previously serialized instance).  Specifically:
     * <ul>
     *   <li>An object written via writeUnshared is always serialized in the
     *       same manner as a newly appearing object (an object that has not
     *       been written to the stream yet), regardless of whether or not the
     *       object has been written previously.
     *
     *   <li>If writeObject is used to write an object that has been previously
     *       written with writeUnshared, the previous writeUnshared operation
     *       is treated as if it were a write of a separate object.  In other
     *       words, ObjectOutputStream will never generate back-references to
     *       object data written by calls to writeUnshared.
     * </ul>
     * While writing an object via writeUnshared does not in itself guarantee a
     * unique reference to the object when it is deserialized, it allows a
     * single object to be defined multiple times in a stream, so that multiple
     * calls to readUnshared by the receiver will not conflict.  Note that the
     * rules described above only apply to the base-level object written with
     * writeUnshared, and not to any transitively referenced sub-objects in the
     * object graph to be serialized.
     *
     * <p>ObjectOutputStream subclasses which override this method can only be
     * constructed in security contexts possessing the
     * "enableSubclassImplementation" SerializablePermission; any attempt to
     * instantiate such a subclass without this permission will cause a
     * SecurityException to be thrown.
     *
     * @param   obj object to write to stream
     * @throws  NotSerializableException if an object in the graph to be
     *          serialized does not implement the Serializable interface
     * @throws  InvalidClassException if a problem exists with the class of an
     *          object to be serialized
     * @throws  IOException if an I/O error occurs during serialization
     * @since 1.4
     */

This is wrong
oos.writeObject(metricsBuf.array());

This is correct, but file will be huge
oos.writeUnshared(metricsBuf.array());

This is correct, file is small
oos.writeObject(Arrays.copyOf(metricsBuf.array(), metricsBuf.position()));

This is correct, file is small
oos.writeInt(metricsBuf.position());
oos.write(metricsBuf.array(), 0, metricsBuf.position());



> GTAggregateScanner$Dump.flush() must not write the WHOLE metrics buffer
> -----------------------------------------------------------------------
>
>                 Key: KYLIN-1723
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1723
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: liyang
>            Assignee: Dong Li
>             Fix For: v1.5.3
>
>
> GTAggregateScanner$Dump.flush() must not write the WHOLE metrics buffer, but only the part that contains data.
> Note the metrics buffer is allocated at the max possible size of metrics, which can be way larger than actual size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)