You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Bowen Ding (Jira)" <ji...@apache.org> on 2020/06/26 14:06:00 UTC

[jira] [Created] (DRILL-7755) Storage IPFS: Cost estimation and query optimization

Bowen Ding created DRILL-7755:
---------------------------------

             Summary: Storage IPFS: Cost estimation and query optimization
                 Key: DRILL-7755
                 URL: https://issues.apache.org/jira/browse/DRILL-7755
             Project: Apache Drill
          Issue Type: Improvement
            Reporter: Bowen Ding
            Assignee: Bowen Ding


The IPFS storage plugin works by first gathering information about the dataset stored on IPFS. Ideally, the dataset is organized as a MerkleTree, where the leaf nodes contain the actual data, and the intermediate nodes record the hashes of the leaf nodes. The plugin needs to know about all the leaf nodes, and their providers, i.e. peers on IPFS who have that piece of data stored.

The numbers and sizes of the leaf nodes may vary across different datasets, or even within a dataset. This information is useful for estimating cost of the scan operation. We need to find a way to tell Drill's planner this information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)