You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Kris Hahn (Jira)" <ji...@apache.org> on 2020/03/07 21:23:00 UTC

[jira] [Comment Edited] (IMPALA-8405) Document UDA state machine

    [ https://issues.apache.org/jira/browse/IMPALA-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054184#comment-17054184 ] 

Kris Hahn edited comment on IMPALA-8405 at 3/7/20, 9:22 PM:
------------------------------------------------------------

The diagram seems to disagree with Tim's statement in the community article: "... for the common case where query execution is entirely in memory - it [serialization] happens after Merge() before the data needs to be sent over the network," unless the Init for Merge - no const args does serialization. Is the diagram correct? Each function except Init for Merge is described in the docs. What does Init for Merge do?

Attached is a horizontal version of Peter's diagram for the docs page (less scrolling for the user).  
 !udaf_state_machine.png!

Looks like there are no diagrams/images in the entire doc set. Maybe the publishing system isn't set up to handle images, in which case, we can use text:
{noformat}
Init 
  |
Update
  |
Out of memory? YES ------  
  NO                     |
  |                      | Spill to disk
  |                      |
Distributed? YES --> Serialize --> Init for Merge --> Merge --> Finalize
  NO                                     |
  |                                      |
   --------------------------------------{noformat}
 * Distributed: number of nodes greater 1 or not a local aggregation
 * Serialize: frees memory
 * Init for Merge: no const args
 * Finalize: frees memory


was (Author: krishahn):
Attached is a horizontal version of Peter's diagram for the docs page (less scrolling for the user). 
!udaf_state_machine.png!

Looks like there are no diagrams/images in the entire doc set. Maybe the publishing system isn't set up to handle images, in which case, we can use text:
{noformat}
Init 
  |
Update
  |
Out of memory? YES ------
  NO                     |
  |                      |
  |                      |
Distributed? YES --> Serialize --> Init for Merge --> Merge --> Finalize
  NO                                     |
  |                                      |
   --------------------------------------{noformat}
 * Serialize frees memory
 * Init for Merge rejects const args
 * Finalize frees memory

> Document UDA state machine
> --------------------------
>
>                 Key: IMPALA-8405
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8405
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Docs
>            Reporter: Tim Armstrong
>            Assignee: Kris Hahn
>            Priority: Major
>              Labels: impala_user_docs_open
>         Attachments: screenshot-1.png, udaf_state_machine.png
>
>
> The documentation in "The Underlying Functions for a UDA" doesn't do a good job of explaining the state transitions that a UDA can go through. E.g. when is Serialize() called. It's complicated because data needs to be serialized to go over the network, but *sometimes* is serialized to spill to disk, which changes the sequence of function calls.
> See https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Function-execution-flow-in-UDAs-and-memory-implications-for/m-p/88892#M5532?eid=1&aid=1 for a user who is trying to understand this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org