You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@echarts.apache.org by GitBox <gi...@apache.org> on 2021/07/20 21:08:31 UTC

[GitHub] [echarts] 100pah commented on a change in pull request #15355: dataset: fix dataset performance drops signifcantly on high dimensions data.

100pah commented on a change in pull request #15355:
URL: https://github.com/apache/echarts/pull/15355#discussion_r673344421



##########
File path: src/model/Series.ts
##########
@@ -385,14 +391,22 @@ class SeriesModel<Opt extends SeriesOption = SeriesOption> extends ComponentMode
         inner(this).data = data;
     }
 
-    getSource(): Source {
-        return inner(this).sourceManager.getSource();
+    getEncode() {
+        const encode = (this as Model<SeriesEncodeOptionMixin>).get('encode');

Review comment:
       Need to 
   ```ts
   (this as Model<SeriesEncodeOptionMixin>).get('encode', true)
   ```
   ?
   The original code is `.get('encode', true)` and there might be no scenario to set  `encode` on option root and share with different series ?

##########
File path: src/data/helper/sourceManager.ts
##########
@@ -346,8 +350,79 @@ export class SourceManager {
      * @param sourceIndex By defualt 0, means "main source".
      *                    Most cases there is only one source.
      */
-    getSource(sourceIndex?: number) {
-        return this._sourceList[sourceIndex || 0];
+    getSource(sourceIndex?: number): Source {
+        sourceIndex = sourceIndex || 0;
+        const source = this._sourceList[sourceIndex];
+        if (!source) {
+            // Series may share source instance with dataset.
+            const upSourceMgrList = this._getUpstreamSourceManagers();
+            return upSourceMgrList[0]
+                && upSourceMgrList[0].getSource(sourceIndex);
+        }
+        return source;
+    }
+
+    /**
+     * Will return undefined if source don't have dimensions.
+     *
+     * Only available for series.
+     */
+    getDataStorage(): DataStorage | undefined {
+        if (__DEV__) {
+            assert(isSeries(this._sourceHost), 'Can only call getDataStorage on series source manager.');
+        }
+        const source = this.getSource(0);
+        const dimensionsDefine = source.dimensionsDefine;
+        const sourceReadKey = source.seriesLayoutBy
+            + '$$'
+            + source.startIndex
+            + (dimensionsDefine ? map(dimensionsDefine, def => def.name).join('$$') : '');
+
+        return this._innerGetDataStorage(source, sourceReadKey);
+    }
+
+    private _innerGetDataStorage(endSource: Source, sourceReadKey: string): DataStorage | undefined {
+        // TODO Can use other sourceIndex?
+        const sourceIndex = 0;
+
+
+        const source = this.getSource(sourceIndex);
+        const storeList = this._storeList;
+
+        // Source from endpoint(usually series) will be read differently
+        // when seriesLayoutBy or startIndex(which is affected by sourceHeader) are different.
+        // So we use this two props as key. Another fact `dimensions` will be checked when initializing SeriesData.
+        const sourceToInit = (endSource || source);

Review comment:
       The `endSource` seems never be null/undefined, so here `source` will never be used ?

##########
File path: src/data/helper/sourceManager.ts
##########
@@ -346,8 +350,79 @@ export class SourceManager {
      * @param sourceIndex By defualt 0, means "main source".
      *                    Most cases there is only one source.
      */
-    getSource(sourceIndex?: number) {
-        return this._sourceList[sourceIndex || 0];
+    getSource(sourceIndex?: number): Source {
+        sourceIndex = sourceIndex || 0;
+        const source = this._sourceList[sourceIndex];
+        if (!source) {
+            // Series may share source instance with dataset.
+            const upSourceMgrList = this._getUpstreamSourceManagers();

Review comment:
       I think it might not necessary to share `source` instance between `sourceManager`s.
   
   The `class Source` originally try to represents the option of `data`, `dimension`, ... of `series` and `dataset`, where `data`, `dimensions` probably be shared between different `source` instance, but `source` instance itself do not share between different `sourceManager`s.
   
   I think it's seems not necessary to change the understanding above. 
   
   Otherwise it brings some code in Line 237, which are not easy to understand that why there is a different that line 237 only compare `seriesLayoutBy` and `sourceHeader` but line 379 also compare `dimensions`.
   And it also brings code here, which is not easy to think through where the `source` instance from on earth (because of the call of `this._getUpstreamSourceManagers`)
   
   The `dataStore` will be shard between `sourceManager`, it's seems enough in performance optimization?
   
   

##########
File path: src/data/helper/sourceManager.ts
##########
@@ -346,8 +350,79 @@ export class SourceManager {
      * @param sourceIndex By defualt 0, means "main source".
      *                    Most cases there is only one source.
      */
-    getSource(sourceIndex?: number) {
-        return this._sourceList[sourceIndex || 0];
+    getSource(sourceIndex?: number): Source {
+        sourceIndex = sourceIndex || 0;
+        const source = this._sourceList[sourceIndex];
+        if (!source) {
+            // Series may share source instance with dataset.
+            const upSourceMgrList = this._getUpstreamSourceManagers();
+            return upSourceMgrList[0]
+                && upSourceMgrList[0].getSource(sourceIndex);
+        }
+        return source;
+    }
+
+    /**
+     * Will return undefined if source don't have dimensions.
+     *
+     * Only available for series.
+     */
+    getDataStorage(): DataStorage | undefined {
+        if (__DEV__) {
+            assert(isSeries(this._sourceHost), 'Can only call getDataStorage on series source manager.');
+        }
+        const source = this.getSource(0);
+        const dimensionsDefine = source.dimensionsDefine;
+        const sourceReadKey = source.seriesLayoutBy
+            + '$$'
+            + source.startIndex
+            + (dimensionsDefine ? map(dimensionsDefine, def => def.name).join('$$') : '');
+
+        return this._innerGetDataStorage(source, sourceReadKey);
+    }
+
+    private _innerGetDataStorage(endSource: Source, sourceReadKey: string): DataStorage | undefined {
+        // TODO Can use other sourceIndex?
+        const sourceIndex = 0;
+
+
+        const source = this.getSource(sourceIndex);
+        const storeList = this._storeList;
+
+        // Source from endpoint(usually series) will be read differently
+        // when seriesLayoutBy or startIndex(which is affected by sourceHeader) are different.
+        // So we use this two props as key. Another fact `dimensions` will be checked when initializing SeriesData.
+        const sourceToInit = (endSource || source);
+        let cachedStoreMap = storeList[sourceIndex];
+
+        if (!cachedStoreMap) {
+            cachedStoreMap = storeList[sourceIndex] = {};
+        }
+
+        let cachedStore = cachedStoreMap[sourceReadKey];
+        if (!cachedStore) {
+            const upSourceMgr = this._getUpstreamSourceManagers()[0];
+
+            if (isSeries(this._sourceHost) && upSourceMgr) {
+                cachedStore = upSourceMgr._innerGetDataStorage(endSource, sourceReadKey);
+            }
+            else {
+                const dimensionsDefine = source.dimensionsDefine;
+                // Can't create a store if don't know dimension..
+                if (source && dimensionsDefine) {

Review comment:
       Why not `endSource` ? 
   It seem not easy to understand what the `source` and `endSource` are here.

##########
File path: src/chart/helper/createSeriesDataFromArray.ts
##########
@@ -107,26 +99,92 @@ function createListFromArray(source: Source | OptionSourceData, seriesModel: Ser
     if (!hasNameEncode && firstCategoryDimIndex != null) {
         dimInfoList[firstCategoryDimIndex].otherDims.itemName = 0;
     }
+    return firstCategoryDimIndex;
+}
 
-    const stackCalculationInfo = enableDataStack(seriesModel, dimInfoList);
+function createListFromArray(
+    sourceOrStore: Source | OptionSourceData | DataStorage,
+    seriesModel: SeriesModel,
+    opt?: {
+        generateCoord?: string
+        useEncodeDefaulter?: boolean | EncodeDefaulter
+        // By default: auto. If `true`, create inverted indices for all ordinal dimension on coordSys.
+        createInvertedIndices?: boolean
+    }
+): SeriesData {
+    opt = opt || {};
 
-    const list = new List(dimInfoList, seriesModel);
+    if (!isSourceInstance(sourceOrStore) && !isDataStorage(sourceOrStore)) {
+        sourceOrStore = createSourceFromSeriesDataOption(
+            sourceOrStore as OptionSourceData
+        );
+    }
 
-    list.setCalculationInfo(stackCalculationInfo);
+    const source = isDataStorage(sourceOrStore) ? sourceOrStore.getSource() : sourceOrStore;
+    const coordSysInfo = getCoordSysInfoBySeries(seriesModel);
+    const coordSysDimDefs = getCoordSysDimDefs(seriesModel, coordSysInfo);
+    const useEncodeDefaulter = opt.useEncodeDefaulter;
 
-    const dimValueGetter = (firstCategoryDimIndex != null && isNeedCompleteOrdinalData(source))
-        ? function (this: List, itemOpt: any, dimName: string, dataIndex: number, dimIndex: number) {
-            // Use dataIndex as ordinal value in categoryAxis
-            return dimIndex === firstCategoryDimIndex
-                ? dataIndex
-                : this.defaultDimValueGetter(itemOpt, dimName, dataIndex, dimIndex);
-        }
+    // Try to ignore unsed dimensions if sharing a high dimension datastorage
+    // 30 is an experience value.
+    const omitUnusedDimensions = isDataStorage(sourceOrStore) && sourceOrStore.getDimensionCount() > 30;
+    const encodeDefaulter = zrUtil.isFunction(useEncodeDefaulter)
+        ? useEncodeDefaulter
+        : useEncodeDefaulter
+        ? zrUtil.curry(makeSeriesEncodeForAxisCoordSys, coordSysDimDefs, seriesModel)
         : null;
 
-    list.hasItemOption = false;
-    list.initData(source, null, dimValueGetter);
+    const createDimensionOptions = {
+        coordDimensions: coordSysDimDefs,
+        generateCoord: opt.generateCoord,
+        encodeDefine: seriesModel.getEncode()
+            // NOTE: If we call createDimensions on same source multiple times.
+            // It will break the encodeDefaulter which has sideeffects.
+            // So we prepare the default encode here instead of passing encoderDefaulter function.
+            || (encodeDefaulter && encodeDefaulter(
+                source, getDimCount(source, coordSysDimDefs, source.dimensionsDefine || [])
+            )),
+        omitUnusedDimensions
+    };
+    let dimInfoList = createDimensions(sourceOrStore, createDimensionOptions);
+    let firstCategoryDimIndex = injectOrdinalMeta(dimInfoList, opt.createInvertedIndices, coordSysInfo);
+
+    if (omitUnusedDimensions) {
+        // sourceOrStore
+        if (!(sourceOrStore as DataStorage).syncDimensionTypes(dimInfoList)) {
+            dimInfoList = createDimensions(sourceOrStore, zrUtil.extend(createDimensionOptions, {
+                omitUnusedDimensions: true
+            }));
+            // Fallback
+            firstCategoryDimIndex = injectOrdinalMeta(
+                dimInfoList, opt.createInvertedIndices, coordSysInfo
+            );
+            sourceOrStore = source;
+        }
+    }

Review comment:
       Not sure why call them twice rather than 
   ```ts
       if (omitUnusedDimensions && !(sourceOrStore as DataStorage).syncDimensionTypes(dimInfoList)) {
            dimInfoList = createDimensions(...);
            firstCategoryDimIndex = injectOrdinalMeta(...);
            sourceOrStore = source;
       }
       else {
            dimInfoList = createDimensions(...);
            firstCategoryDimIndex = injectOrdinalMeta(...);
       }
   
   ```

##########
File path: src/chart/helper/createSeriesDataFromArray.ts
##########
@@ -107,26 +99,92 @@ function createListFromArray(source: Source | OptionSourceData, seriesModel: Ser
     if (!hasNameEncode && firstCategoryDimIndex != null) {
         dimInfoList[firstCategoryDimIndex].otherDims.itemName = 0;
     }
+    return firstCategoryDimIndex;
+}
 
-    const stackCalculationInfo = enableDataStack(seriesModel, dimInfoList);
+function createListFromArray(
+    sourceOrStore: Source | OptionSourceData | DataStorage,
+    seriesModel: SeriesModel,
+    opt?: {
+        generateCoord?: string
+        useEncodeDefaulter?: boolean | EncodeDefaulter
+        // By default: auto. If `true`, create inverted indices for all ordinal dimension on coordSys.
+        createInvertedIndices?: boolean
+    }
+): SeriesData {
+    opt = opt || {};
 
-    const list = new List(dimInfoList, seriesModel);
+    if (!isSourceInstance(sourceOrStore) && !isDataStorage(sourceOrStore)) {
+        sourceOrStore = createSourceFromSeriesDataOption(
+            sourceOrStore as OptionSourceData
+        );
+    }
 
-    list.setCalculationInfo(stackCalculationInfo);
+    const source = isDataStorage(sourceOrStore) ? sourceOrStore.getSource() : sourceOrStore;
+    const coordSysInfo = getCoordSysInfoBySeries(seriesModel);
+    const coordSysDimDefs = getCoordSysDimDefs(seriesModel, coordSysInfo);
+    const useEncodeDefaulter = opt.useEncodeDefaulter;
 
-    const dimValueGetter = (firstCategoryDimIndex != null && isNeedCompleteOrdinalData(source))
-        ? function (this: List, itemOpt: any, dimName: string, dataIndex: number, dimIndex: number) {
-            // Use dataIndex as ordinal value in categoryAxis
-            return dimIndex === firstCategoryDimIndex
-                ? dataIndex
-                : this.defaultDimValueGetter(itemOpt, dimName, dataIndex, dimIndex);
-        }
+    // Try to ignore unsed dimensions if sharing a high dimension datastorage
+    // 30 is an experience value.
+    const omitUnusedDimensions = isDataStorage(sourceOrStore) && sourceOrStore.getDimensionCount() > 30;
+    const encodeDefaulter = zrUtil.isFunction(useEncodeDefaulter)
+        ? useEncodeDefaulter
+        : useEncodeDefaulter
+        ? zrUtil.curry(makeSeriesEncodeForAxisCoordSys, coordSysDimDefs, seriesModel)
         : null;
 
-    list.hasItemOption = false;
-    list.initData(source, null, dimValueGetter);
+    const createDimensionOptions = {
+        coordDimensions: coordSysDimDefs,
+        generateCoord: opt.generateCoord,
+        encodeDefine: seriesModel.getEncode()
+            // NOTE: If we call createDimensions on same source multiple times.
+            // It will break the encodeDefaulter which has sideeffects.

Review comment:
       Yes I've understood that the side-effect is indeed annoying.

##########
File path: src/data/helper/createDimensions.ts
##########
@@ -44,33 +51,350 @@ export type CoordDimensionDefinitionLoose = CoordDimensionDefinition['name'] | C
 
 export type CreateDimensionsParams = {
     coordDimensions?: CoordDimensionDefinitionLoose[],
+    /**
+     * Will use `source.dimensionsDefine` if not given.
+     */
     dimensionsDefine?: DimensionDefinitionLoose[],
+    /**
+     * Will use `source.encodeDefine` if not given.
+     */
     encodeDefine?: HashMap<OptionEncodeValue, DimensionName> | OptionEncode,
     dimensionsCount?: number,
+    /**
+     * Make default encode if user not specified.
+     */
     encodeDefaulter?: EncodeDefaulter,
     generateCoord?: string,
-    generateCoordCount?: number
+    generateCoordCount?: number,
+
+    /**
+     * If omit unused dimension
+     * Used to improve the performance on high dimension data.
+     */
+    omitUnusedDimensions?: boolean
 };
 
 /**
- * @param opt.coordDimensions
- * @param opt.dimensionsDefine By default `source.dimensionsDefine` Overwrite source define.
- * @param opt.encodeDefine By default `source.encodeDefine` Overwrite source define.
- * @param opt.encodeDefaulter Make default encode if user not specified.
+ * This method builds the relationship between:
+ * + "what the coord sys or series requires (see `sysDims`)",
+ * + "what the user defines (in `encode` and `dimensions`, see `opt.dimsDef` and `opt.encodeDef`)"
+ * + "what the data source provids (see `source`)".
+ *
+ * Some guess strategy will be adapted if user does not define something.
+ * If no 'value' dimension specified, the first no-named dimension will be
+ * named as 'value'.

Review comment:
       In comment above:
   `sysDims` =renamed=> `opt.coordDimensions`.
   `opt.dimsDef` =renamed=> `opt.dimensionsDefine`
   `opt.encodeDefine` =renamed=> `opt.encodeDefine`

##########
File path: src/data/helper/createDimensions.ts
##########
@@ -44,33 +51,350 @@ export type CoordDimensionDefinitionLoose = CoordDimensionDefinition['name'] | C
 
 export type CreateDimensionsParams = {
     coordDimensions?: CoordDimensionDefinitionLoose[],
+    /**
+     * Will use `source.dimensionsDefine` if not given.
+     */
     dimensionsDefine?: DimensionDefinitionLoose[],
+    /**
+     * Will use `source.encodeDefine` if not given.
+     */
     encodeDefine?: HashMap<OptionEncodeValue, DimensionName> | OptionEncode,
     dimensionsCount?: number,
+    /**
+     * Make default encode if user not specified.
+     */
     encodeDefaulter?: EncodeDefaulter,
     generateCoord?: string,
-    generateCoordCount?: number
+    generateCoordCount?: number,
+
+    /**
+     * If omit unused dimension
+     * Used to improve the performance on high dimension data.
+     */
+    omitUnusedDimensions?: boolean
 };
 
 /**
- * @param opt.coordDimensions
- * @param opt.dimensionsDefine By default `source.dimensionsDefine` Overwrite source define.
- * @param opt.encodeDefine By default `source.encodeDefine` Overwrite source define.
- * @param opt.encodeDefaulter Make default encode if user not specified.
+ * This method builds the relationship between:
+ * + "what the coord sys or series requires (see `sysDims`)",
+ * + "what the user defines (in `encode` and `dimensions`, see `opt.dimsDef` and `opt.encodeDef`)"
+ * + "what the data source provids (see `source`)".
+ *
+ * Some guess strategy will be adapted if user does not define something.
+ * If no 'value' dimension specified, the first no-named dimension will be
+ * named as 'value'.
  */
 export default function createDimensions(
     // TODO: TYPE completeDimensions type
-    source: Source | List | OptionSourceData,
+    source: Source | SeriesData | OptionSourceData | DataStorage,
     opt?: CreateDimensionsParams
 ): DataDimensionInfo[] {
+    if (source instanceof DataStorage) {
+        source = source.getSource();
+    }
+    else if (source instanceof SeriesData) {
+        source = source.getStorage().getSource();
+    }
+    else if (!isSourceInstance(source)) {
+        source = createSourceFromSeriesDataOption(source as OptionSourceData);
+    }
+
     opt = opt || {};
-    return completeDimensions(opt.coordDimensions || [], source, {
-        // FIXME:TS detect whether source then call `.dimensionsDefine` and `.encodeDefine`?
-        dimsDef: opt.dimensionsDefine || (source as Source).dimensionsDefine,
-        encodeDef: opt.encodeDefine || (source as Source).encodeDefine,
-        dimCount: opt.dimensionsCount,
-        encodeDefaulter: opt.encodeDefaulter,
-        generateCoord: opt.generateCoord,
-        generateCoordCount: opt.generateCoordCount
+
+    const sysDims = opt.coordDimensions || [];
+    const dimsDef = opt.dimensionsDefine || source.dimensionsDefine || [];
+    const coordDimNameMap = createHashMap<true, DimensionName>();
+    const result: DataDimensionInfo[] = [];
+    const omitUnusedDimensions = opt.omitUnusedDimensions;
+    const isUsingSourceDimensionsDef = dimsDef === source.dimensionsDefine;
+    // Try to cache the dimNameMap if the dimensionsDefine is from source.
+    const canCacheDimNameMap = (isUsingSourceDimensionsDef && omitUnusedDimensions);
+    let dataDimNameMap = canCacheDimNameMap && inner(source).dimNameMap;

Review comment:
       It depends on that `source` instance is immutable. Do we need to add a comment on `class Source` to caution that keep it  immutable in future?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@echarts.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@echarts.apache.org
For additional commands, e-mail: commits-help@echarts.apache.org