January 20, 2024

Crowdsourcing Alpha

Imagine paying thousands of dollars for trading signals, only to get variations of simple signals plus noise or overcomplicated nonsense. This is what people face when they try to crowdsource alpha. The question is: why is crowdsourcing trading signals so difficult, and can it actually work? Also, why would a hedge fund share their proprietary datasets?

The Basic Problem

Crowdsourcing alpha- seems straightforward at first. If one data scientist can find profitable patterns, surely a thousand data scientists exploring different approaches could uncover even more alpha.

But, that is far from reality. Crowdsourcing alpha isn't just a numbers game. It's about designing the right incentives, measuring the right things, and dealing with participants who will game your system whenever they can.

Why Usual Approaches Fail

Several people have run crowdsourcing programs for trading signals, paying contributors per signal that passed statistical tests and correlation thresholds. The idea was simple: pay for each signal that looked statistically valid and wasn't too correlated to common signals.

What happened? Every single submission fell into two categories:

Trivially modified simple signals: Taking basic indicators and adding random noise or small transformations to pass correlation tests
Overcomplicated nonsense: Models that looked sophisticated but contained zero real insight

The participants weren't being malicious. They were being rational. They optimized exactly what was measured: pass tests, collect money. The programs wanted alpha; contributors delivered creative test-passing.

This reveals the fundamental problem: your metrics need to be long-term, reward deep work over gaming, and stay minimally separated from actual business value.

If you don't enforce users to have skin in the game, you get adversely selected. Incentives and punishment drive behaviour- and when those incentives are misaligned, you get a pile of correlated garbage instead of genuine alpha.

Then there is the problem of sharing datasets. At the top multi-strategy hedge funds around the world, there are whole teams that source thousands of datasets, clean them and deploy them for data scientists to extract alpha. They are very secretive about the datasets they use.

When you are crowdsourcing alpha research, it is in your interest to share your proprietary datasets.

One way to do this is to obfuscate the data. The usual methods are to anonymize symbols, price, etc. That is a good start, but there have been instances where the participants decode the dataset to some extent: Kaggle's Two Sigma Competition or Kaggle's Jane Street Competition

In general, Kaggle style competitions are not a good way to extract crowdsourced alpha. The reason is that they are one-time competitions. The more number of participants, a greater chance that a few models do really well in out of sample datasets. No follow-ups, no model-performance monitoring, no skin in the game. This goes back to the problems of submissions and user behaviour discussed above.

How Numerai Redesigned the Game

Numerai (https://numer.ai), a hedge fund started in 2015, took a fundamentally different approach. Instead of simply asking for signals and paying for whatever passed basic tests, they built a carefully designed incentive system around few core principles.

Data Obfuscation

Numerai uses Fully Homomorphic Encryption (FHE) to share their proprietary datasets with the participants. FHE allows computations to be performed on encrypted data without needing to decrypt it. Therefore, privacy is preserved. So, participants can train and run ML models on FHE data and submit their predictions to Numerai.

Skin in the Game

Numerai requires participants to stake NMR (their native cryptocurrency token) to receive allocation. Payback is proportional to stake. If your model underperforms, you lose your staked tokens. This single design choice filters out low-effort / overfit submissions more effectively than any statistical test.

Target Alignment

Early on, Numerai asked the users to predict gross returns or excess returns over an index, then residualized these forecasts to common factors behind the scenes. The problem: if submitted signals aren't pre-residualized, they become much weaker after processing. Numerai learned not to make users solve a proxy problem.

They released increasingly sophisticated targets like Cyrus (April 2023) and the Rain dataset (2023) that were aggressively residualized to factors, horizons, and capacity constraints. These targets aligned more closely with the final signals Numerai actually wanted to trade. The lesson: measure and reward the exact target that drives your bottom line.

Structural Diversity

Users do a poor job creating diversified signals even when you measure and penalize correlation to other users. Correlation is an easy metric to game. Numerai drives diversity at the platform level by providing fundamentally different targets- value, momentum, residualized 20D, residualized 60D, etc.. This guarantees diversity by construction rather than by hope.

Meta Model Contribution (MMC)

In April 2020, Numerai introduced what may be their most important innovation: Meta Model Contribution. This metric fundamentally changed the game by paying contributors for exactly what Numerai cares about- whether a signal introduces new information the meta model doesn't already have.

The Technical Core: MMC

Meta Model Contribution measures how much unique value your model adds to Numerai's ensemble after accounting for what all other staked models already provide. It's brilliant because it's extremely difficult to game- you can't just orthogonalize to a handful of known signals; you need to be orthogonal to the stake-weighted ensemble of everything submitted.

Understanding the Math

MMC can be understood in three equivalent ways, each offering different intuition:

Richard's MMC (from Numerai founder Richard Craib) asks: "Given a model, how much does the Meta Model's correlation with the target change if we increase the model's stake by a small amount?"

\[ MMC = \lim_{\epsilon \to 0} \frac{\text{corr}(((1-\epsilon)m + \epsilon p), y) - \text{corr}(m, y)}{\epsilon} \]

Where \(y\) is the target, \(m\) is the Meta Model, and \(p\) are a model's predictions.

Murky's MMC (derived by community member Murky) uses calculus to arrive at a cleaner formulation:

\[ MMC = y^T \cdot (p - m \cdot (p^T \cdot m) / (m^T \cdot m)) \]

Mike's MMC (the original orthogonalization approach):

\[ MMC = \text{cov}(y, p_{\text{neutral}}) = \text{cov}(y, p - m \cdot (m^\dagger \cdot p)) \]

Here \(m^\dagger\) is the pseudo inverse of \(m\) and \(p_{neutral}\) is feature neutralized prediction.

Deriving Mike's MMC

Let the orthogonal projection of \(p\) onto the span of \(m\) be \(\text{proj}_m(p) = \beta m\), where \(\beta\) minimizes \(\|p - \beta m\|^2\).

Taking the derivative and setting it to zero, \(-2m^T (p - \beta m) = 0\)

\[ \begin{align*} m^T p = \beta m^T m , \ \ \beta &= \frac{m^T p}{m^T m} \end{align*} \]

Neutralized predictions:

\[ p_{\text{neutral}} = p - \text{proj}_m(p) = p - \frac{m^T p}{m^T m} m = p - m \cdot (m^\dagger \cdot p) \]

Because, vector pseudoinverse: \(m^\dagger = \frac{m^T}{m^T m}\), so \(m^\dagger p = \frac{m^T p}{m^T m} = \beta\).

All three formulations are 100% correlated when predictions and meta model are properly normalized using tie-kept ranking and gaussianization.

Implementation Details

The calculation follows these steps:

Normalize both your predictions and the Meta Model using tie-kept ranking
Gaussianize each to create standardized distributions with mean 0 and standard deviation 1
Orthogonalize your predictions with respect to the Meta Model (removing any component that's just copying the ensemble)
Calculate the covariance of these orthogonalized predictions with the target

The core Python implementation from the Numerai documentation:

def contribution(
    predictions: pd.DataFrame,
    meta_model: pd.Series,
    live_targets: pd.Series,
) -> pd.Series:
    # Rank and normalize so mean=0 and std=1
    p = gaussian(tie_kept_rank(predictions)).values
    m = gaussian(tie_kept_rank(meta_model.to_frame()))[meta_model.name].values

    # Orthogonalize predictions wrt meta model
    neutral_preds = orthogonalize(p, m)
    
    # Center the target
    live_targets -= live_targets.mean()

    # Covariance (equivalent since mean = 0)
    mmc = (live_targets @ neutral_preds) / len(live_targets)

    return pd.Series(mmc, index=predictions.columns)

Why MMC Changes Everything

MMC is dramatically more stable than previous metrics. Its distribution over time closely resembles standard correlation (CORR), making it reliable for model optimization. More importantly, it's far harder to game than simple correlation thresholds.

When you're evaluated on MMC, you can't succeed by:

Slightly modifying popular signals
Copying other high-performing models
Finding signals that happen to be correlated with the current ensemble

You can only score well by discovering genuinely novel, orthogonal sources of alpha. This is exactly what Numerai needs.

Feature Exposure and Neutralization

Understanding Meta Model Contribution is just the beginning. Another critical concept that separates successful models from failed ones is feature exposure- how much your model depends on specific features.

The Regime Dependency Problem

Feature exposure measures how concentrated your model's reliance is on specific features. The calculation is straightforward: Spearman correlation between your predictions and each feature. High exposure means you're betting those features continue to work. Low exposure means your features have minimal linear relationship to the target.

Markets are non-stationary. Features that perform well in one regime can become worthless in the next. When regime shifts hit, high-exposure models don't just underperform- they burn. If your maximum feature exposure is too high and concentrated in too few features, you're betting that "these features will be good forever."

You need to monitor both mean exposure and maximum exposure. Maximum exposure tells you how reliant you are on a single feature. Mean exposure, along with other aggregations, gives you a sense of your distribution of dependency.

Understanding Regime Sensitivity

Feature exposure shows how much you rely on each feature. You calculate it using Spearman correlation between your predictions and each feature. High exposure means you're betting that certain features will keep working. Low exposure means your predictions don't depend much on any single feature.

The problem is that markets change. A feature that works great today might be useless tomorrow. Models that depend too much on a few features don't just do worse when markets change- they break completely. You need to watch both maximum exposure (your biggest dependency) and mean exposure (average dependency across all features). Most people aim for maximum exposure around 0.29.

The Math of Neutralization

Feature neutralization fixes this problem through residualization- a linear least squares operation:

\[ \text{neutralized} = \text{predictions} - \text{proportion} \times \text{exposures} \cdot \text{pinv}(\text{exposures}) \cdot \text{predictions} \]

The steps are simple:

Regress your predictions against features
Calculate fitted values (the part explained by features)
Take residuals (what's left after removing feature dependence)

This breaks predictions into two parts: direct effects (simple feature relationships) and interactive effects (complex combinations). Neutralization removes the direct part- the obvious signal anyone can find- while keeping the interactions where real skill shows.

The Tradeoff You Can't Avoid

Lower feature exposure means more stable performance across different markets, but you might lose some predictive power.

Many people make this mistake: they try to remove all exposure, then end up with a model that predicts nothing useful. What you really want is a balance between stability (working across different markets) and power (actually predicting targets).

Four ways to manage exposure, from simple to advanced:

L1 regularization: Add Lasso penalties when training. Easy to do, gives small improvements but limited upside.
Full or partial neutralization: Residualize against all features with proportion=1.0 (or < 1.0). Removes exposure but also removes a lot of signal.
Selective neutralization: Only target your riskiest features- maybe the top 10%. This carefully removes bad exposures while keeping good ones.
Feature timing: Dynamically adjust which features to use based on their current performance. Weight features higher when they're working well and reduce reliance when they're not. This adapts to changing market conditions instead of using a fixed feature set.

Why Top Models Stay Consistent

The best models spread their predictive power across many features instead of relying on just a few. Since features go through good and bad periods, spreading your bets means you always have some features doing well while others do poorly.

This connects directly to MMC: models that add unique information have usually learned to get signal from many different feature combinations instead of obvious patterns.

Beyond Signal Generation

Alpha discovery is still early in the investment process. Enormous value exists in the meta-model and portfolio construction steps- how you filter, cluster, and combine forecasts matters as much as the forecasts themselves.

Building institutional-grade portfolios from crowdsourced signals requires deep expertise in risk management, factor neutralization, and capacity awareness.

References

Numerai discussion forums: https://forum.numer.ai/