DAO governance is one of the richest and most under-analyzed datasets in crypto. Every proposal, vote, delegation, and treasury movement is public — and almost none of it is available in a form a researcher can actually use. The data is spread across governance front-ends, subgraphs, block explorers, and forum threads, each with its own schema and none designed to be joined to the others.
This is a guide to what a research-grade DAO governance dataset contains, and how to assemble one from fragmented public sources.
What belongs in a governance dataset
The exact shape depends on the research question, but most serious governance datasets need some combination of the following, keyed to a canonical DAO identity:
- Proposals — title, description, author, status, and the on-chain or off-chain mechanism used.
- Votes — voter address, choice, voting power, and timestamp, so you can reconstruct participation over time.
- Delegates — who holds delegated voting power, how it shifts, and how concentrated it is.
- Treasury — asset composition and flows, which contextualize what governance is actually deciding over.
- Token linkage — the governance token's ticker and contract, so governance activity can be joined to market data.
Why it is harder than it looks
The difficulty is not access — it is reconciliation. On-chain governance and off-chain (signature-based) voting live in different systems. A single DAO may have migrated governance frameworks, changed its token, or run parallel processes. Voter addresses need to be resolved to delegates, and delegates to real participation. And the DAO's name in a governance portal rarely matches its ticker in a price feed, which is where entity resolution comes in.
“The difficulty with governance data is not access — it is reconciliation across systems that were never meant to be joined.”
A sourcing approach that holds up
- 01Define the unit of analysis — is a row a proposal, a vote, a DAO-month, or a delegate? This decides everything downstream.
- 02Map each DAO to a canonical identity linking its governance framework, token ticker, and contract addresses.
- 03Pull governance activity from the appropriate source per DAO — subgraphs and APIs where they exist, on-chain events and portal extraction where they do not.
- 04Reconcile on-chain and off-chain voting into one consistent participation record.
- 05Normalize, deduplicate, and validate — then document the coverage and known limitations per DAO.
Joining governance to market data
The most valuable governance datasets do not stop at governance. Once each DAO is mapped to its token, you can join participation and treasury data to historical price and volume — and suddenly you can ask questions that neither dataset answers alone: does governance activity lead price, how does voter concentration relate to volatility, what happens to participation through a drawdown. That join is only possible if the entity resolution was done carefully at the start.
Building this well is a data engineering project, not a scraping job. If your research depends on governance data that is clean, joined, and documented — across hundreds of DAOs — that is the kind of dataset we build to order.