You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Liya Fan (Jira)" <ji...@apache.org> on 2020/05/26 01:46:00 UTC

[jira] [Commented] (ARROW-8909) [Java] Out of order writes using setSafe

    [ https://issues.apache.org/jira/browse/ARROW-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116350#comment-17116350 ] 

Liya Fan commented on ARROW-8909:
---------------------------------

[~saurabhm] Thank you for reporting the problem.
I think the behavior is by design. For variable width vectors, we do not support setting values in random order, as this might cause severe performance penalty. 

> [Java] Out of order writes using setSafe
> ----------------------------------------
>
>                 Key: ARROW-8909
>                 URL: https://issues.apache.org/jira/browse/ARROW-8909
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>            Reporter: Saurabh
>            Priority: Major
>
> I noticed that calling setSafe on a VarCharVector with indices not in increasing order causes the lastIndex to be set to the index in the last call to setSafe.
> Is this a documented and expected behavior ?
> Sample code:
> {code:java}
> import java.util.Collections;
> import lombok.extern.slf4j.Slf4j;
> import org.apache.arrow.memory.RootAllocator;
> import org.apache.arrow.vector.VarCharVector;
> import org.apache.arrow.vector.VectorSchemaRoot;
> import org.apache.arrow.vector.types.pojo.ArrowType;
> import org.apache.arrow.vector.types.pojo.Field;
> import org.apache.arrow.vector.types.pojo.Schema;
> import org.apache.arrow.vector.util.Text;
> @Slf4j
> public class ATest {
>   public static void main() {
>     Schema schema = new Schema(Collections.singletonList(Field.nullable("Data", new ArrowType.Utf8())));
>     try (VectorSchemaRoot vroot = VectorSchemaRoot.create(schema, new RootAllocator())) {
>       VarCharVector vec = (VarCharVector) vroot.getVector("Data");
>       for (int i = 0; i < 10; i++) {
>         vec.setSafe(i, new Text(Integer.toString(i) + "_mtest"));
>       }
>       vec.setSafe(7, new Text(Integer.toString(7) + "_new"));
>       log.info("Data at index 8 Before {}", vec.getObject(8));
>       vroot.setRowCount(10);
>       log.info("Data at index 8 After {}", vec.getObject(8));
>       log.info(vroot.contentToTSVString());
>     }
>   }
> }
> {code}
>  
> If I don't set the index 7 after the loop, I get all the 0_mtest, 1_mtest, ..., 9_mtest entries.
> If I set index 7 after the loop, I see 0_mtest, ..., 5_mtest, 6_mtext, 7_new,
>     Before the setRowCount, the data at index 8 is -> *st8_mtest*  ; index 9 is *9_mtest*
>    After the setRowCount, the data at index 8 is -> "" ; index  9 is ""
> With a text with more chars instead of 4 with _new, it keeps eating into the data at the following indices.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)