You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Shaofeng SHI (JIRA)" <ji...@apache.org> on 2017/08/09 05:50:00 UTC
[jira] [Closed] (KYLIN-1506) Refactor resource interface for
timeseries-based data like jobs to much better performance
[ https://issues.apache.org/jira/browse/KYLIN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shaofeng SHI closed KYLIN-1506.
-------------------------------
> Refactor resource interface for timeseries-based data like jobs to much better performance
> ------------------------------------------------------------------------------------------
>
> Key: KYLIN-1506
> URL: https://issues.apache.org/jira/browse/KYLIN-1506
> Project: Kylin
> Issue Type: Improvement
> Affects Versions: v1.5.0, v1.4.0, v1.3.0
> Reporter: Hao Chen
> Assignee: Hao Chen
> Labels: patch
>
> h1. Problem
> Currently all operations like getJobOutputs/getJobs and so on are use two-times scan to get the response, for example, currently the scan always:
> 1. Get keys, sort, get first and last key (in fact which is just get by prefix filter) with "store.listResources(resourcePath)"
> 2. Re-scan the keys with timestamp filter: "store.getAllResources(startKey,endKey,startTime, endTime, Class, Serializer)"
> {code}
> public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long timeEndInMillis) throws PersistentException {
> try {
> NavigableSet<String> resources = store.listResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT);
> if (resources == null || resources.isEmpty()) {
> return Collections.emptyList();
> }
> // Collections.sort(resources);
> String rangeStart = resources.first();
> String rangeEnd = resources.last();
> return store.getAllResources(rangeStart, rangeEnd, timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}
> h2. Solution
> In fact we could simply combine the two-times scan into one directly:
> {code}
> store.getAllResources(resourcePath,startTime, endTime, Class, Serializer)
> store.getAllResources(resourcePath, Class, Serializer)
> {code}
> For example, refactored "List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long timeEndInMillis)" as following:
> {code}
> public List<ExecutableOutputPO> getJobOutputs(long timeStartInMillis, long timeEndInMillis) throws PersistentException {
> try {
> return store.getAllResources(ResourceStore.EXECUTE_OUTPUT_RESOURCE_ROOT, timeStartInMillis, timeEndInMillis, ExecutableOutputPO.class, JOB_OUTPUT_SERIALIZER);
> } catch (IOException e) {
> logger.error("error get all Jobs:", e);
> throw new PersistentException(e);
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)