You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Ádám Szita (Jira)" <ji...@apache.org> on 2022/02/25 10:04:00 UTC

[jira] [Created] (TEZ-4392) Streamed event serialization and distribution

Ádám Szita created TEZ-4392:
-------------------------------

             Summary: Streamed event serialization and distribution
                 Key: TEZ-4392
                 URL: https://issues.apache.org/jira/browse/TEZ-4392
             Project: Apache Tez
          Issue Type: Improvement
            Reporter: Ádám Szita


Tez currently compiles the full list of events for a given job, then serializes every event into another list before starting to distribute the events to executor instances.

This way all the events are held in memory which in some cases may take up much space (e.g. 1 MB split size X thousands of split count). It would be more memory efficient to do this in a streamed way, that is, serialize an event right before sending it out to an executor, not before.

Currently InputInitializer has the following methods that are of interest for this:
{code:java}
public abstract List<Event> initialize() throws Exception;

public abstract void handleInputInitializerEvent(List<InputInitializerEvent> var1) throws Exception;{code}
could these be changed to return/take an Iterator of Event/InputInitializerEvent ?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)