January 20, 2024
Imagine paying thousands of dollars for trading signals, only to get variations of simple signals plus noise or overcomplicated nonsense. This is what people face when they try to crowdsource alpha. The question is: why is crowdsourcing trading signals so difficult, and can it actually work? Also, why would a hedge fund share their proprietary datasets?
Crowdsourcing alpha- seems straightforward at first. If one data scientist can find profitable patterns, surely a thousand data scientists exploring different approaches could uncover even more alpha.
But, that is far from reality. Crowdsourcing alpha isn't just a numbers game. It's about designing the right incentives, measuring the right things, and dealing with participants who will game your system whenever they can.
Several people have run crowdsourcing programs for trading signals, paying contributors per signal that passed statistical tests and correlation thresholds. The idea was simple: pay for each signal that looked statistically valid and wasn't too correlated to common signals.
What happened? Every single submission fell into two categories:
The participants weren't being malicious. They were being rational. They optimized exactly what was measured: pass tests, collect money. The programs wanted alpha; contributors delivered creative test-passing.
This reveals the fundamental problem: your metrics need to be long-term, reward deep work over gaming, and stay minimally separated from actual business value.
If you don't enforce users to have skin in the game, you get adversely selected. Incentives and punishment drive behaviour- and when those incentives are misaligned, you get a pile of correlated garbage instead of genuine alpha.
Then there is the problem of sharing datasets. At the top multi-strategy hedge funds around the world, there are whole teams that source thousands of datasets, clean them and deploy them for data scientists to extract alpha. They are very secretive about the datasets they use.
When you are crowdsourcing alpha research, it is in your interest to share your proprietary datasets.
One way to do this is to obfuscate the data. The usual methods are to anonymize symbols, price, etc. That is a good start, but there have been instances where the participants decode the dataset to some extent: Kaggle's Two Sigma Competition or Kaggle's Jane Street Competition
In general, Kaggle style competitions are not a good way to extract crowdsourced alpha. The reason is that they are one-time competitions. The more number of participants, a greater chance that a few models do really well in out of sample datasets. No follow-ups, no model-performance monitoring, no skin in the game. This goes back to the problems of submissions and user behaviour discussed above.
Numerai (https://numer.ai), a hedge fund started in 2015, took a fundamentally different approach. Instead of simply asking for signals and paying for whatever passed basic tests, they built a carefully designed incentive system around few core principles.
Numerai uses Fully Homomorphic Encryption (FHE) to share their proprietary datasets with the participants. FHE allows computations to be performed on encrypted data without needing to decrypt it. Therefore, privacy is preserved. So, participants can train and run ML models on FHE data and submit their predictions to Numerai.
Numerai requires participants to stake NMR (their native cryptocurrency token) to receive allocation. Payback is proportional to stake. If your model underperforms, you lose your staked tokens. This single design choice filters out low-effort / overfit submissions more effectively than any statistical test.
Early on, Numerai asked the users to predict gross returns or excess returns over an index, then residualized these forecasts to common factors behind the scenes. The problem: if submitted signals aren't pre-residualized, they become much weaker after processing. Numerai learned not to make users solve a proxy problem.
They released increasingly sophisticated targets like Cyrus (April 2023) and the Rain dataset (2023) that were aggressively residualized to factors, horizons, and capacity constraints. These targets aligned more closely with the final signals Numerai actually wanted to trade. The lesson: measure and reward the exact target that drives your bottom line.
Users do a poor job creating diversified signals even when you measure and penalize correlation to other users. Correlation is an easy metric to game. Numerai drives diversity at the platform level by providing fundamentally different targets- value, momentum, residualized 20D, residualized 60D, etc.. This guarantees diversity by construction rather than by hope.
In April 2020, Numerai introduced what may be their most important innovation: Meta Model Contribution. This metric fundamentally changed the game by paying contributors for exactly what Numerai cares about- whether a signal introduces new information the meta model doesn't already have.
Meta Model Contribution measures how much unique value your model adds to Numerai's ensemble after accounting for what all other staked models already provide. It's brilliant because it's extremely difficult to game- you can't just orthogonalize to a handful of known signals; you need to be orthogonal to the stake-weighted ensemble of everything submitted.
MMC can be understood in three equivalent ways, each offering different intuition:
Richard's MMC (from Numerai founder Richard Craib) asks: "Given a model, how much does the Meta Model's correlation with the target change if we increase the model's stake by a small amount?"
\[ MMC = \lim_{\epsilon \to 0} \frac{\text{corr}(((1-\epsilon)m + \epsilon p), y) - \text{corr}(m, y)}{\epsilon} \]
Where \(y\) is the target, \(m\) is the Meta Model, and \(p\) are a model's predictions.
Murky's MMC (derived by community member Murky) uses calculus to arrive at a cleaner formulation:
\[ MMC = y^T \cdot (p - m \cdot (p^T \cdot m) / (m^T \cdot m)) \]
Mike's MMC (the original orthogonalization approach):
\[ MMC = \text{cov}(y, p_{\text{neutral}}) = \text{cov}(y, p - m \cdot (m^\dagger \cdot p)) \]
Here \(m^\dagger\) is the pseudo inverse of \(m\) and \(p_{neutral}\) is feature neutralized prediction.
Deriving Mike's MMC
Let the orthogonal projection of \(p\) onto the span of \(m\) be \(\text{proj}_m(p) = \beta m\), where \(\beta\) minimizes \(\|p - \beta m\|^2\).
Taking the derivative and setting it to zero, \(-2m^T (p - \beta m) = 0\)
\[ \begin{align*} m^T p = \beta m^T m , \ \ \beta &= \frac{m^T p}{m^T m} \end{align*} \]
Neutralized predictions:
\[ p_{\text{neutral}} = p - \text{proj}_m(p) = p - \frac{m^T p}{m^T m} m = p - m \cdot (m^\dagger \cdot p) \]
Because, vector pseudoinverse: \(m^\dagger = \frac{m^T}{m^T m}\), so \(m^\dagger p = \frac{m^T p}{m^T m} = \beta\).
All three formulations are 100% correlated when predictions and meta model are properly normalized using tie-kept ranking and gaussianization.
The calculation follows these steps:
The core Python implementation from the Numerai documentation:
def contribution(
predictions: pd.DataFrame,
meta_model: pd.Series,
live_targets: pd.Series,
) -> pd.Series:
# Rank and normalize so mean=0 and std=1
p = gaussian(tie_kept_rank(predictions)).values
m = gaussian(tie_kept_rank(meta_model.to_frame()))[meta_model.name].values
# Orthogonalize predictions wrt meta model
neutral_preds = orthogonalize(p, m)
# Center the target
live_targets -= live_targets.mean()
# Covariance (equivalent since mean = 0)
mmc = (live_targets @ neutral_preds) / len(live_targets)
return pd.Series(mmc, index=predictions.columns)
MMC is dramatically more stable than previous metrics. Its distribution over time closely resembles standard correlation (CORR), making it reliable for model optimization. More importantly, it's far harder to game than simple correlation thresholds.
When you're evaluated on MMC, you can't succeed by:
You can only score well by discovering genuinely novel, orthogonal sources of alpha. This is exactly what Numerai needs.
Understanding Meta Model Contribution is just the beginning. Another critical concept that separates successful models from failed ones is feature exposure- how much your model depends on specific features.
Feature exposure measures how concentrated your model's reliance is on specific features. The calculation is straightforward: Spearman correlation between your predictions and each feature. High exposure means you're betting those features continue to work. Low exposure means your features have minimal linear relationship to the target.
Markets are non-stationary. Features that perform well in one regime can become worthless in the next. When regime shifts hit, high-exposure models don't just underperform- they burn. If your maximum feature exposure is too high and concentrated in too few features, you're betting that "these features will be good forever."
You need to monitor both mean exposure and maximum exposure. Maximum exposure tells you how reliant you are on a single feature. Mean exposure, along with other aggregations, gives you a sense of your distribution of dependency.
Feature exposure shows how much you rely on each feature. You calculate it using Spearman correlation between your predictions and each feature. High exposure means you're betting that certain features will keep working. Low exposure means your predictions don't depend much on any single feature.
The problem is that markets change. A feature that works great today might be useless tomorrow. Models that depend too much on a few features don't just do worse when markets change- they break completely. You need to watch both maximum exposure (your biggest dependency) and mean exposure (average dependency across all features). Most people aim for maximum exposure around 0.29.
Feature neutralization fixes this problem through residualization- a linear least squares operation:
\[ \text{neutralized} = \text{predictions} - \text{proportion} \times \text{exposures} \cdot \text{pinv}(\text{exposures}) \cdot \text{predictions} \]
The steps are simple:
This breaks predictions into two parts: direct effects (simple feature relationships) and interactive effects (complex combinations). Neutralization removes the direct part- the obvious signal anyone can find- while keeping the interactions where real skill shows.
Lower feature exposure means more stable performance across different markets, but you might lose some predictive power.
Many people make this mistake: they try to remove all exposure, then end up with a model that predicts nothing useful. What you really want is a balance between stability (working across different markets) and power (actually predicting targets).
Four ways to manage exposure, from simple to advanced:
The best models spread their predictive power across many features instead of relying on just a few. Since features go through good and bad periods, spreading your bets means you always have some features doing well while others do poorly.
This connects directly to MMC: models that add unique information have usually learned to get signal from many different feature combinations instead of obvious patterns.
Alpha discovery is still early in the investment process. Enormous value exists in the meta-model and portfolio construction steps- how you filter, cluster, and combine forecasts matters as much as the forecasts themselves.
Building institutional-grade portfolios from crowdsourced signals requires deep expertise in risk management, factor neutralization, and capacity awareness.