You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Bowen Ding (Jira)" <ji...@apache.org> on 2020/06/26 14:06:00 UTC
[jira] [Created] (DRILL-7755) Storage IPFS: Cost estimation and
query optimization
Bowen Ding created DRILL-7755:
---------------------------------
Summary: Storage IPFS: Cost estimation and query optimization
Key: DRILL-7755
URL: https://issues.apache.org/jira/browse/DRILL-7755
Project: Apache Drill
Issue Type: Improvement
Reporter: Bowen Ding
Assignee: Bowen Ding
The IPFS storage plugin works by first gathering information about the dataset stored on IPFS. Ideally, the dataset is organized as a MerkleTree, where the leaf nodes contain the actual data, and the intermediate nodes record the hashes of the leaf nodes. The plugin needs to know about all the leaf nodes, and their providers, i.e. peers on IPFS who have that piece of data stored.
The numbers and sizes of the leaf nodes may vary across different datasets, or even within a dataset. This information is useful for estimating cost of the scan operation. We need to find a way to tell Drill's planner this information.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)