You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Andrew Musselman (Jira)" <ji...@apache.org> on 2022/01/07 21:42:00 UTC

[jira] [Created] (MAHOUT-2141) Discussion and planning epic for adding blockchain data sources and analytics use cases

Andrew Musselman created MAHOUT-2141:
----------------------------------------

             Summary: Discussion and planning epic for adding blockchain data sources and analytics use cases
                 Key: MAHOUT-2141
                 URL: https://issues.apache.org/jira/browse/MAHOUT-2141
             Project: Mahout
          Issue Type: Epic
            Reporter: Andrew Musselman
            Assignee: Andrew Musselman


*About*

Discussion point for adding ethereum-compatible blockchains as data sources and some pertinent use cases.

We will add stories as children to this epic.

Proposal is to use ethereum-compatible ledgers as they adhere to the same standard for tokens ([https://ethereum.org/en/developers/docs/standards/tokens|https://ethereum.org/en/developers/docs/standards/tokens)]), for instance for smart contracts ([https://ethereum.org/en/developers/docs/smart-contracts|https://ethereum.org/en/developers/docs/smart-contracts)]).

{*}How to Get Started{*}{*}{*}
To explore concepts and data, get a copy of go-ethereum (geth: [https://geth.ethereum.org/docs/install-and-build/installing-geth]). Run it against a network and it will grab historical records. The Goerli test network's entire three years of data is only 32GB, so there are small enough data sets to play with, and the data files are stored on your local disk, by default in ~/.ethereum.
 
There are libraries that interact live with any given ledger including Web3JS ([https://web3js.readthedocs.io/en/v1.5.2/]) and Web3.py ([https://web3py.readthedocs.io/en/stable/]), so reading out of ledgers is simple.
 
Reading and indexing the actual data might mean writing custom parsers for Mahout and Lucene, and possibly getting into decompiling bytecode back into readable Solidity code.
 
*Some Starter Discussions* * Is there a place to persist each ledger we use as a data source, or would this pull down the ledger data every time for a new instance?
 * Should we build a live demo of this to run on the mahout.a.o web site?

 

*Example Use Cases*
 # Search-indexes of given ledgers
 # Computed similarity to other accounts on the same ledger based on activity history
 # Time-series analysis of gas (transaction) fees across multiple ledgers
 # Time-series analysis of transactions (overall # per week/month/year/custom period, by user account etc.) for a list of ledgers. (Comparative analysis of usage)
 # Max/Min range of transactions for different ledgers



--
This message was sent by Atlassian Jira
(v8.20.1#820001)