You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Hussain Towaileb (Jira)" <ji...@apache.org> on 2022/09/07 12:32:00 UTC

[jira] [Created] (ASTERIXDB-3073) Dynamic Prefixes for External Datasets

Hussain Towaileb created ASTERIXDB-3073:
-------------------------------------------

             Summary: Dynamic Prefixes for External Datasets
                 Key: ASTERIXDB-3073
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3073
             Project: Apache AsterixDB
          Issue Type: Epic
          Components: EXT - External data
    Affects Versions: 0.9.8
            Reporter: Hussain Towaileb
            Assignee: Hussain Towaileb
             Fix For: 0.9.9


Currently, when a user creates an external dataset, a prefix can be provided which directs the external dataset to the location the files need to be read from. This has a major impact on performance as it allows us to only read the files we are interested in an avoid reading unnecessary files.

However, a limitation to the current implementation is that the prefix is always a static path, leading to challenges such as reading the file (for example) of all userId > 1 or all files of userId INĀ [1, 2, 3], in such scenarios we always end up reading all the files, which can be a very expensive operation, then using our WHERE clause to get the desired result.

This feature aims to support a more dynamic approach to allow for a flexible prefix that can support different scenarios (for example, the user passing the desired userId in the prefix instead of a single prefix value) and still maintain the behavior of reading the minimal number of files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)