You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/06/17 14:59:00 UTC

[jira] [Commented] (IMPALA-8542) Access trace collection for data cache

    [ https://issues.apache.org/jira/browse/IMPALA-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138531#comment-17138531 ] 

ASF subversion and git services commented on IMPALA-8542:
---------------------------------------------------------

Commit d38e4d10de98d46c4b5b6a4f5ef0a9b9bbb61dae in impala's branch refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d38e4d1 ]

IMPALA-9435: Usability enhancements for data cache access trace

The data cache access trace was added in IMPALA-8542 as a way
to capture a workload's cache accesses to allow later analysis.

This modifies the data cache access trace to improve usability:
1. The access trace now uses a SimpleLogger to limit the total
   number of trace entries per file and total number of trace
   files. This caps the disk usage for the access trace. The
   behavior is controlled by the data_cache_trace_dir,
   max_data_cache_trace_file_size, and max_data_cache_trace_files
   startup parameters.
2. This introduces the data_cache_trace_percentage, which allows
   tracing only a subset of the entries produced. It traces
   accesses for a consistent subset of the cache (i.e. accesses
   for a filename/mtime/offset are either always traced or
   never traced). This allows for better analysis than a random
   sample. Tracing a subset of accesses can reduce any performance
   overhead from tracing. It also provides a way to trace a longer
   time period in the same number of entries.

This also implements the ability to replay traces against a
specific cache configuration. The replayer can produce JSON output
with cache hit/miss information for the original trace and the
replay. This provides a building block for building analysis
comparing different cache sizes or cache eviction policies.

Testing:
 - New backend tests in data-cache-test, data-cache-trace-test
 - Manually testing the data-cache-trace-replayer

Change-Id: I0f84204d8e5145f5fa8d4851d9c19ac317db168e
Reviewed-on: http://gerrit.cloudera.org:8080/15914
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Access trace collection for data cache
> --------------------------------------
>
>                 Key: IMPALA-8542
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8542
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Major
>             Fix For: Impala 3.3.0
>
>
> Now that we have a remote-read data cache, it would be useful to log an access trace. The trace can be then fed back into various cache policy simulators to compare the relative performance, and do "what if" analysis (how would hit rate react with larger/smaller capacities)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org