You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Robert Burke (JIRA)" <ji...@apache.org> on 2018/07/25 16:48:00 UTC

[jira] [Commented] (BEAM-4852) [Go SDK] Beam should not retain the symbol table after function resolution

    [ https://issues.apache.org/jira/browse/BEAM-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555967#comment-16555967 ] 

Robert Burke commented on BEAM-4852:
------------------------------------

It looks like the simplest initial reduction would be to precache the address and name of all Subprograms (A global or file static subroutine or function, as the DWARF spec defines them), on first need in [symtab.go|https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/util/symtab/symtab.go#L107], which will remove the need for the linear scan through the debug info for each new function, by only scanning once.

We could probably further pare down the number of symbols we're caching by checking against the Language attribute, to only cache the Go functions (in the event C/C++ or other code is linked into the binary as well).
(As defined by [Delve|https://github.com/derekparker/delve/blob/master/pkg/proc/bininfo.go]  and the DWARF Spec:
const dwarfGoLanguage = 22 // DW_LANG_Go (from DWARF v5, section 7.12, page 231 )

The real "trick" will be to avoid stalling the whole program while scanning data. It should be possible to do so in a background goroutine, and have the appropriate locking such that callers are blocked if the scan isn't done, or more expediently, checking if the symbol has been cached already then returning if there's an entry.
Otherwise, we could set up some kind of blocking system with channels so that when an entry is added, anything waiting for that symbol can be unblocked instead of waiting for the scan to finish. This conjecture might be more overhead than strictly necessary for a one time scan of the binary though.

> [Go SDK] Beam should not retain the symbol table after function resolution
> --------------------------------------------------------------------------
>
>                 Key: BEAM-4852
>                 URL: https://issues.apache.org/jira/browse/BEAM-4852
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-go
>            Reporter: Robert Burke
>            Assignee: Robert Burke
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In some instances, Beam Go requires introspecting the symbol table for the binary to resolve functions. However it may be possible to cache these results for all applicable functions, and then allow the table to be garbage collected.
> The table represents a large heap cost that is retained for the lifetime of a job.
> A secondary goal would be to avoid incurring the cost entirely when there's nothing to look up for a job. Eg for unit tests, or ancillary uses of the beam SDK (eg. migrating from some other system to beam shouldn't incur the cost when the old system is being used, just because beam is linked in and triggered by a runtime switch).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)