Understanding Factor Investing: The Fama-French Model Explained

In our journey so far, we’ve forecasted stock prices, optimized a portfolio, and backtested our strategy. We’ve treated stock returns as a monolithic block of data. But what if we could break those returns down into their fundamental drivers? What if we could explain why our portfolio performed the way it did?

Welcome to the fourth installment in our hands-on quantitative finance series. Today, we learn Factor Investing by implementing one of the most influential models in modern finance: the Fama-French Three-Factor Model. This model revolutionized investment management by showing that stock returns are not just about market risk. They are also systematically driven by other “factors.”

We will use Python to dissect the returns of the portfolio we back-tested in our last article. We will uncover its exposure to key market factors and, most importantly, searching for that elusive prize: alpha.

The Theory: Beyond CAPM to Fama-French

For decades, the Capital Asset Pricing Model (CAPM) was the standard. It proposed that a stock’s return could be explained by a single factor: its sensitivity to the overall market’s movements (its beta). Any return above what was predicted by beta was considered “alpha”, a measure of a manager’s skill.

However, in 1992, Eugene Fama and Kenneth French published a groundbreaking paper showing that two other factors had significant explanatory power over stock returns:

Size (SMB – Small Minus Big): Historically, smaller companies have tended to outperform larger companies over the long term. The SMB factor measures this excess return of small-cap stocks over large-cap stocks.
Value (HML – High Minus Low): “Value” stocks (those with a high book-to-market ratio, meaning they are cheap relative to their book value) have historically outperformed “growth” stocks (those with a low book-to-market ratio). The HML factor measures this excess return.

The Fama-French Three-Factor Model is expressed as a multiple linear regression:

R_p – R_f = α + β_mkt * (R_mkt – R_f) + β_smb * SMB + β_hml * HML + ε

Where:

R_p - R_f is the portfolio’s excess return (return over the risk-free rate).
α (Alpha) is the unexplained return. This is what we’re looking for! A positive and statistically significant alpha suggests the strategy has generated returns not explained by its exposure to these common risk factors.
(R_mkt - R_f) is the market risk premium (the market’s return minus the risk-free rate).
SMB is the return of the size factor.
HML is the return of the value factor.
β (beta) coefficients are the sensitivities of our portfolio to each of these factors. For example, a high β_smb means our portfolio behaves like a small-cap fund.
ε (epsilon) is the error term, representing the portion of the return not explained by the model.

Our goal is to run this regression for our MPT strategy and interpret the results.

Let’s Get Coding: Implementing the Model

We will analyze the returns of the “MPT Strategy” we generated in the previous backtesting article.

Prerequisites

You will need the statsmodels library for running the regression. If you don’t have it, install it now:

pip install numpy pandas pandas-datareader matplotlib scipy statsmodels

Step 1: Gathering the Data

First, we need the Fama-French factor data. Luckily, Kenneth French makes this data publicly available on his website. pandas-datareader has a convenient function to download it directly.

We also need the returns of our MPT strategy from the previous article. For this guide, we will quickly regenerate them.

import pandas as pd
import numpy as np
import yfinance as yf # Import yfinance 
from scipy.optimize import minimize
import statsmodels.api as sm
import pandas_datareader.data as web # Import pandas_datareader.data as web

# --- Regenerate MPT Strategy Returns---

# 1. Fetch Price Data
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'JPM', 'V', 'PG', 'JNJ']
start_date = '2010-01-01'
end_date = '2023-12-31' # Use a consistent end date
adj_close_df = pd.DataFrame()
for ticker in tickers:
    # Use yfinance to fetch data
    data = yf.download(ticker, start=start_date, end=end_date)
    # Access 'Adj Close' using the multi-level column index
    adj_close_df[ticker] = data['Close']
log_returns = np.log(adj_close_df / adj_close_df.shift(1)).dropna()

# 2. Optimization Function
risk_free_rate = 0.02
def get_optimal_weights(log_returns_slice):
    mean_returns = log_returns_slice.mean()
    cov_matrix = log_returns_slice.cov()
    num_assets = len(tickers)
    def get_portfolio_stats(weights):
        r = np.sum(mean_returns * weights) * 252
        v = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights))) * np.sqrt(252)
        return np.array([r, v, (r - risk_free_rate) / v])
    def neg_sharpe(weights): return -get_portfolio_stats(weights)[2]
    constraints = ({'type': 'eq', 'fun': lambda w: np.sum(w) - 1})
    bounds = tuple((0, 1) for _ in range(num_assets))
    initial_weights = np.array([1./num_assets] * num_assets)
    result = minimize(neg_sharpe, initial_weights, method='SLSQP', bounds=bounds, constraints=constraints)
    return result.x

# 3. Backtest Loop
rebalance_dates = log_returns.resample('Y').first().index
portfolio_returns_list = []
for i in range(1, len(rebalance_dates) -1):
    start_opt = rebalance_dates[i-1]
    end_opt = rebalance_dates[i]
    start_hold = end_opt
    end_hold = rebalance_dates[i+1]
    optimization_data = log_returns.loc[start_opt:end_opt]
    try:
        optimal_weights = get_optimal_weights(optimization_data)
    except:
        optimal_weights = np.array([1./len(tickers)] * len(tickers))
    holding_period_returns = log_returns.loc[start_hold:end_hold]
    period_portfolio_return = np.dot(holding_period_returns, optimal_weights)
    portfolio_returns_list.append(pd.Series(period_portfolio_return, index=holding_period_returns.index))
mpt_returns = pd.concat(portfolio_returns_list)
mpt_returns.name = "MPT_Strategy"

print("MPT Strategy Returns (first 5 days):")
print(mpt_returns.head())

# --- Step 1 (Continued): Fetch Fama-French Data ---

# Download the 3-factor model data (daily)
ff_factors = web.DataReader('F-F_Research_Data_Factors_daily', 'famafrench', start=start_date, end=end_date)[0]

# The data is in percentages, so divide by 100
ff_factors = ff_factors / 100

print("\nFama-French Factors (last 5 days):")
print(ff_factors.tail())

This script first quickly runs our backtest from Part 3 to get the strategy returns. Then, it downloads the daily Fama-French factors: Mkt-RF (Market-Risk Free), SMB (Small Minus Big), and HML (High Minus Low), plus the Risk-Free rate (R_F).

Step 2: Merging the Datasets

To run the regression, we need to align our portfolio returns with the factor data in a single DataFrame.

# Ensure index is datetime for both
mpt_returns.index = pd.to_datetime(mpt_returns.index)
ff_factors.index = pd.to_datetime(ff_factors.index)

# Calculate the portfolio's excess return
# We subtract the daily risk-free rate from our portfolio's daily log returns
merged_data = pd.merge(mpt_returns, ff_factors, left_index=True, right_index=True)
merged_data['Portfolio_Excess_Return'] = merged_data['MPT_Strategy'] - merged_data['RF']

# Rename the market excess return column for clarity
merged_data.rename(columns={'Mkt-RF': 'Market_Excess_Return'}, inplace=True)

print("\nMerged Data for Regression (first 5 rows):")
print(merged_data.head())

Step 3: Running the Regression

Now we can use the statsmodels library to perform the multiple linear regression.

# Define our independent variables (the factors)
X = merged_data[['Market_Excess_Return', 'SMB', 'HML']]

# Define our dependent variable (the portfolio's excess return)
y = merged_data['Portfolio_Excess_Return']

# Add a constant to the independent variables (this is for the alpha intercept)
X = sm.add_constant(X)

# Run the Ordinary Least Squares (OLS) regression
model = sm.OLS(y, X).fit()

# Print the detailed regression results
print("\n--- Fama-French Three-Factor Model Regression Results ---")
print(model.summary())

A screenshot of the OLS regression results for the Fama-French Three-Factor Model, displaying coefficients, R-squared values, and statistical significance for the dependent variable 'Portfolio_Excess_Return'.

Step 4: Interpreting the Results

The model.summary() output is packed with information. Here’s how to read the most important parts:

R-squared (0.622): This is one of the most important numbers here. It tells us that 62.2% of the daily variation in our portfolio’s excess returns can be explained by the three Fama-French factors (Market, Size, and Value).
- Interpretation: This is a reasonably strong result. It means the model does a good job of explaining why our portfolio performed the way it did. The remaining ~38% of the performance is due to other factors not in the model or company-specific news (idiosyncratic risk).
Prob (F-statistic) (0.00): This value tests the overall significance of the regression. A value of 0.00 means there is virtually a 0% chance that all our factor coefficients are zero.
- Interpretation: The model as a whole is statistically significant and useful.
The const (Alpha): This is the daily alpha, which is -0.00007658. To make it more intuitive, let’s annualize it: -0.00007658 * 252 ≈ -1.93%.
- P-value (0.619): This is the most critical piece of information for alpha. The p-value of 0.619 is very high (much greater than the standard 0.05 threshold).
- Verdict: Because the p-value is not statistically significant, we cannot conclude that our strategy generated any real alpha. The small negative alpha we observe is statistically indistinguishable from zero. This is a key insight. After accounting for the portfolio’s exposure to common market risks, our MPT strategy did not deliver any “skill-based” excess return.
The Market_Excess_Return (Market Beta) Coefficient (1.0044): – This is the portfolio’s market beta.
- P-value (0.000): This relationship is highly statistically significant.
- Verdict: A beta of ~1.00 means the portfolio moves almost perfectly in sync with the overall market. For every 1% the market goes up or down, our portfolio is expected to move up or down by 1.0044%. It carries the same systematic risk as the market itself.
The SMB (Size Beta) Coefficient (-0.3124): The coefficient is negative.
- P-value (0.000): The result is highly statistically significant.
- Verdict: A significant negative loading on the SMB factor reveals a strong tilt towards large-cap stocks. This makes perfect sense, as our portfolio (AAPL, MSFT, JPM, etc.) is composed entirely of market giants. When small-cap stocks outperform large-caps, our portfolio tends to lag, and it excels when large-caps are in favor.
The HML (Value Beta) Coefficient (-0.1399): The coefficient is negative.
- P-value (0.000): The result is also highly statistically significant.
- Verdict: A significant negative loading on the HML factor reveals a clear tilt towards growth stocks over value stocks. This is also expected, as many of the tech stocks in our list are considered growth investments. The portfolio tends to do well when growth stocks outperform value stocks.

Conclusion: A Deeper Understanding

The portfolio is essentially a large-cap, growth-tilted fund that moves in lock-step with the market. Its performance over the backtest period is well-explained by these characteristics, and it did not produce any statistically significant alpha.

You have now moved beyond simply measuring performance to explaining it. By running a Fama-French regression, you’ve deconstructed your MPT strategy’s returns and gained powerful insights:

You’ve quantified its market beta, confirming its risk relative to the market.
You’ve uncovered its style tilts, seeing that it behaves like a large-cap growth fund. This is a critical insight that wasn’t obvious just from looking at the backtest’s final return.
Most importantly, you have calculated its alpha. This provides a measure of its performance after accounting for the common risks it faced.

This type of factor analysis is a daily activity at professional investment firms. It’s used to understand fund performance, control for unintended risks, and actively seek out new sources of alpha.

Next Steps:

More Factors: The Fama-French model has since been expanded to a five-factor model, adding Profitability (RMW) and Investment (CMA). You could try implementing that.
Momentum: Another famous factor is Momentum (MOM), which you could add to the regression.
Analyze Other Portfolios: Run this analysis on your benchmark portfolio (equal weight) or on a real-world mutual fund to see what drives its returns.

You are now equipped with one of the most fundamental tools in quantitative investment analysis. You can look at a return stream and begin to tell the story of what’s happening under the hood.

Happy regressing!

SimplifiedZone

Leave a ReplyCancel reply

Quantitative Finance Series: Deconstructing Portfolio Returns with the Fama-French Three-Factor Model