You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Andrew Musselman (Jira)" <ji...@apache.org> on 2022/01/07 21:48:00 UTC

[jira] [Created] (MAHOUT-2142) Discussion and planning epic for adding blockchain data sources and analytics use cases

Andrew Musselman created MAHOUT-2142:
----------------------------------------

             Summary: Discussion and planning epic for adding blockchain data sources and analytics use cases
                 Key: MAHOUT-2142
                 URL: https://issues.apache.org/jira/browse/MAHOUT-2142
             Project: Mahout
          Issue Type: Epic
            Reporter: Andrew Musselman
            Assignee: Andrew Musselman


*About*

Proposal is to provide a new data source, namely any number of ethereum-compatible ledgers, and pick a few compelling use cases to build out this year.

We will add children to this epic for specific work items.

*Example Use Cases*
 # Search-indexes of given ledgers
 # Computed similarity to other accounts on the same ledger based on activity history
 # Time-series analysis of gas (transaction) fees across multiple ledgers
 # Time-series analysis of transactions (overall # per week/month/year/custom period, by user account etc.) for a list of ledgers. (Comparative analysis of usage)
 # Max/Min range of transactions for different ledgers

 
*How to Get Started*
To explore ledger operations and data, get a copy of go-ethereum (geth: [https://geth.ethereum.org/docs/install-and-build/installing-geth]) and run it against a network to get all historical records. The Goerli test network's entire three years of data is only 32GB, so there are small enough data sets to play with, and the data files are stored on your local disk by default at ~/ethereum.
 
There are libraries that interact live with any given ledger including Web3JS ([https://web3js.readthedocs.io/en/v1.5.2/]) and Web3.py ([https://web3py.readthedocs.io/en/stable/]), so reading out of ledgers is simple.
 
Reading and indexing the actual data might mean writing custom parsers for Mahout and Lucene, and possibly getting into decompiling bytecode back into readable Solidity code, so there are pieces we would need to plan out.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)