Chapter 2
2.1 Learning Objectives
By the end of this chapter, you will be able to:
- Explain why standard Python lists are inefficient for large-scale financial data.
- Create and manipulate 1D (vectors) and 2D (matrices) NumPy arrays.
- Perform vectorized operations to calculate returns across thousands of assets simultaneously.
- Use Linear Algebra (dot products) to calculate portfolio variance and expected returns.
2.2 Introduction: The Need for Speed
In Chapter 1, we calculated the return of a stock using a for loop. While this works for a few data points, it fails at the institutional level.
The Problem:
Imagine analyzing the S&P 500 index.
- 500 companies.
- 20 years of daily data (~5,000 trading days).
- Total data points: 500 × 5,000 = 2,500,000.
Looping through 2.5 million items in pure Python is slow.
The Solution: NumPy (Numerical Python)
NumPy provides a new data structure called the ndarray (N-dimensional array). Unlike Python lists, which are scattered in memory, NumPy arrays are stored in contiguous blocks of memory (like C or Fortran). This allows for Vectorization; applying a mathematical operation to an entire array at once, without writing a loop.
To use NumPy, we import it using the universal standard alias np.
import numpy as np
2.3 The NumPy Array
The core building block of quantitative finance is the Array.
1. Creating Arrays from Lists
We can convert a standard Python list into a NumPy array.
Financial Context:
1D Array (Vector): Represents a single time series (e.g., closing prices of AAPL).
2D Array (Matrix): Represents a panel of data (e.g., Rows = Dates, Columns = Different Stocks).
import numpy as np
# A list of returns for 3 days
returns_list = [0.01, 0.02, –0.01]
# Convert to NumPy Array
returns_arr = np.array(returns_list)
print(returns_arr)
# Output: [ 0.01 0.02 -0.01] (Notice no commas in output)
print(type(returns_arr))
# Output: <class ‘numpy.ndarray‘>
2. Array Attributes
When loading data, it is critical to know its “shape” (dimensions).
- .ndim: Number of dimensions (1 for vector, 2 for matrix).
- .shape: The size of each dimension (Rows, Columns).
- .dtype: The data type (e.g., float64, int32).
# A 2D Matrix: 2 Stocks (Rows), 3 Days of Prices (Cols)
price_matrix = np.array([
[100, 101, 102], # Stock A
[50, 48, 49] # Stock B
])
print(f“Dimensions: {price_matrix.ndim}”) # 2
print(f“Shape: {price_matrix.shape}”) # (2, 3) -> 2 Rows, 3 Cols
print(f“Data Type: {price_matrix.dtype}”) # int64 (or int32)
int (Integer): Whole numbers without a fraction (e.g., 10, -5).
float (Floating Point): Numbers with a decimal point (e.g., 3.14, -0.001).
2.4 Generating Financial Data
Often in modeling, we need to generate data from scratch, such as creating a dummy portfolio or setting up a simulation grid. NumPy has specific functions for this.
1. np.zeros() and np.ones()
Useful for initializing weights in a portfolio. If we have an empty portfolio, we start with zeros.
# Create an array of 5 zeros (representing 0% allocation to 5 stocks)
weights = np.zeros(5)
print(weights)
# Output: [0. 0. 0. 0. 0.]
2. np.arange() (Array Range)
Similar to Python’s range() but returns an array. Useful for generating time indices (e.g., “Day 1 to Day 10”).
days = np.arange(1, 11) # Start at 1, stop before 11
print(days)
# Output: [ 1 2 3 4 5 6 7 8 9 10]
3. np.linspace() (Linear Space)
Crucial for Sensitivity Analysis.
Financial Context:
You want to price a bond under different yield assumptions. You need 5 scenarios ranging from 1% to 5%. linspace generates evenly spaced numbers.
# Generate 5 rates between 0.01 and 0.05
rates = np.linspace(0.01, 0.05, 5)
print(rates)
# Output: [0.01 0.02 0.03 0.04 0.05]
4. np.random (Random Number Generation)
We used random in Chapter 1. NumPy’s random module is much faster and can generate entire matrices at once. This is the engine for Monte Carlo simulations.
# Generate 10 random daily returns (Mean=0, Std Dev=1) – Standard Normal
random_returns = np.random.normal(0, 1, 10)
print(random_returns)
Check Your Understanding
- Exercise 2.1 (Shape Analysis):
Create a 2D array representing the prices of 3 stocks over 5 days. Use random integers between 100 and 200 to fill it. Print the .shape of the array. Hint: Use np.random.randint().
- Exercise 2.2 (Scenario Generation):
You are stress-testing a portfolio. You need to test how it performs if the market drops by -10%, -20%, -30%, -40%, and -50%. Use np.linspace or np.arange to generate an array of these 5 stress scenarios (as decimals: -0.10 to -0.50).
# Solution 2.1
# 3 Stocks (Rows), 5 Days (Cols)
stock_data = np.random.randint(100, 200, size=(3, 5))
print(“Stock Data:n”, stock_data)
print(“Shape:”, stock_data.shape) # Should be (3, 5)
# Solution 2.2
# Using linspace to get exactly 5 points
stress_scenarios = np.linspace(-0.10, –0.50, 5)
print(“Stress Scenarios:”, stress_scenarios)
2.5 Vectorized Operations
In Chapter 1, if we wanted to add $10 to a list of stock prices, we had to write a loop.
In NumPy, we simply add 10 to the array. This is called Vectorization.
1. Element-Wise Arithmetic
Vectorization means applying an operation to an entire array at once. NumPy handles the looping internally in optimized C-code, making it 50-100x faster than Python loops.
Comparison: The “Loop” way vs. The “NumPy” way
Financial Context:
We have prices for 3 stocks, and we want to apply a $1.50 dividend adjustment to all of them.
import numpy as np
# Standard Python List approach (Slow & Verbose)
prices_list = [100.0, 50.0, 25.0]
adjusted_list = []
for p in prices_list:
adjusted_list.append(p + 1.50)
# NumPy Approach (Fast & Clean)
prices_arr = np.array([100.0, 50.0, 25.0])
adjusted_arr = prices_arr + 1.50 # <– Vectorization in action!
print(adjusted_arr)
# Output: [101.5 51.5 26.5]
2. Operations Between Arrays
We can also perform math between two arrays. This works element-by-element.
Condition: The arrays must usually be the same shape (e.g., both are length 3).
Financial Context: Calculating Daily P&L
We have the number of shares held and the price change for a portfolio of 3 stocks.
P&L=Shares×Price Close-Price Open
shares = np.array([100, 50, 200])
price_open = np.array([150.0, 200.0, 50.0])
price_close = np.array([155.0, 198.0, 52.0])
# Step 1: Calculate Price Change (Vector Subtraction)
price_change = price_close – price_open
# Result: [5.0, -2.0, 2.0]
# Step 2: Calculate Profit (Vector Multiplication)
daily_profit = shares * price_change
print(f“Profit per stock: {daily_profit}”)
# Output: [ 500. -100. 400.]
2.6 Universal Functions (ufuncs)
Finance involves more than just + and -. We need logarithms for returns, square roots for volatility, and exponents for compounding. NumPy provides Universal Functions (ufuncs) for this.
1. Log Returns (np.log)
Financial Context:
The standard for measuring returns in quantitative finance is the natural logarithm.
rlog=lnPt-lnPt-1
prices = np.array([100.0, 102.0, 105.0, 103.0])
# To calculate log returns, we can simply take the log of the whole array
log_prices = np.log(prices)
print(log_prices)
# Output: [4.605… 4.624… 4.653… 4.634…]
2. Square Root (np.sqrt)
Financial Context:
Used for volatility scaling.
variance = 0.04
volatility = np.sqrt(variance)
print(volatility) # 0.2
Volatility scaling is a technique used in quantitative finance to adjust the exposure of an investment or a time-series estimate based on risk. It typically refers to two distinct but related concepts: scaling across time and scaling for position sizing.
1. Scaling Across Time (The Square Root Rule)
In risk management, you often need to convert volatility from one timeframe to another (e.g., daily to annual). Under the assumption that returns are independent and identically distributed (i.i.d.), volatility scales with the square root of time.
σT=σ1×T
Example: To annualize a daily volatility of 1%, you multiply it by the square root of the number of trading days in a year (typically 252).
1%×252≈15.87%
2. Volatility Targeting (Position Sizing)
This is a strategic application where a portfolio manager adjusts leverage to maintain a constant level of risk. The goal is to ensure that a strategy contributes the same amount of risk regardless of whether the market is calm or turbulent.
High Volatility: Reduce position size (de-leverage).
Low Volatility: Increase position size (leverage up).
The Weighting Formula:
The weight ($W_t$) assigned to an asset is inversely proportional to its forecasted volatility ($sigma_t$):
$$W_t = frac{text{Target Volatility}}{sigma_t}$$
3. Absolute Value (np.abs)
Financial Context:
Useful for calculating “tracking error” or deviation magnitude without worrying about direction (positive/negative).
returns = np.array([-0.05, 0.03, –0.01])
magnitudes = np.abs(returns)
print(magnitudes)
# Output: [0.05 0.03 0.01]
2.7 Aggregation and Axes
Once we have calculated returns for 500 stocks, we usually want to summarize them: “What was the average return?” or “What was the maximum loss?”
1. Simple Aggregations (sum, mean, min, max, std)
portfolio_returns = np.array([0.05, –0.02, 0.01, 0.03])
print(f“Total Return: {np.sum(portfolio_returns)}”)
print(f“Average Return: {np.mean(portfolio_returns)}”)
print(f“Volatility (Std Dev): {np.std(portfolio_returns)}”)
print(f“Max Loss: {np.min(portfolio_returns)}”)
2. The axis Parameter (Critical for 2D Data)
When working with a 2D Matrix (Rows=Time, Cols=Stocks), we can aggregate in two directions:
- axis=0: “Collapse the rows.” (Calculate down the column). Used to find the average return for each stock over time.
- axis=1: “Collapse the columns.” (Calculate across the row). Used to find the average return of the entire portfolio on a specific day.
Financial Context:
We have a matrix of returns for 2 stocks over 3 days.
# 3 Days (Rows), 2 Stocks (Cols)
returns_matrix = np.array([
[0.01, 0.05], # Day 1: Stock A=1%, Stock B=5%
[0.02, –0.01], # Day 2: Stock A=2%, Stock B=-1%
[0.01, 0.02] # Day 3: Stock A=1%, Stock B=2%
])
# Scenario A: What is the average return for EACH STOCK? (Down the columns)
avg_stock_returns = np.mean(returns_matrix, axis=0)
print(“Average per Stock:”, avg_stock_returns)
# Output: [0.0133… 0.02] (Stock A avg, Stock B avg)
# Scenario B: What is the DAILY performance of the portfolio? (Across the rows)
daily_portfolio_return = np.mean(returns_matrix, axis=1)
print(“Daily Portfolio Performance:”, daily_portfolio_return)
# Output: [0.03 0.005 0.015] (Day 1 avg, Day 2 avg, Day 3 avg)
Check Your Understanding
- Exercise 2.3 (Vectorized Impact):
You have an array of prices: prices = np.array([100, 102, 104, 106]).
A market crash occurs, and all prices drop by 20%. Write one line of code to create a new array crashed_prices.
- Exercise 2.4 (Filtering with Logic):
Bonus Concept: You can filter arrays using conditions like prices[prices > 100].
Given returns = np.array([0.05, -0.02, 0.03, -0.04, 0.01]), write code to:
a) Create a boolean mask identifying all negative returns.
b) Calculate the mean of only the negative returns (Average Loss).
# Solution 2.3
prices = np.array([100, 102, 104, 106])
crashed_prices = prices * (1 – 0.20)
print(“Crashed:”, crashed_prices)
# Solution 2.4
returns = np.array([0.05, –0.02, 0.03, –0.04, 0.01])
# a) Boolean Mask
negative_mask = returns < 0
print(“Negative Days:”, negative_mask) # [False, True, False, True, False]
# b) Filter and Mean
negative_returns = returns[negative_mask]
average_loss = np.mean(negative_returns)
print(“Average Loss:”, average_loss) # -0.03
2.8 Indexing and Slicing
In financial data analysis, you rarely use the entire dataset at once. You might want the “first 30 days,” “only the tech stocks,” or “just the closing prices.” NumPy offers powerful tools to slice data efficiently.
1. 1D Slicing (Time Series)
Slicing a 1D array works exactly like slicing a Python list: [start:stop:step].
Financial Context:
We have 10 days of stock prices. We want to analyze the first week (first 5 days) and the trend of the last 2 days.
import numpy as np
prices = np.array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])
# Select first 5 days (Indices 0, 1, 2, 3, 4)
first_week = prices[0:5]
print(“First Week:”, first_week)
# Select the last 2 days (Indices -2 to end)
recent_trend = prices[-2:]
print(“Recent Trend:”, recent_trend)
2. 2D Slicing (The Data Matrix)
This is where NumPy shines. When you have a 2D matrix (Rows=Time, Columns=Stocks), you slice using a comma to separate the dimensions: [rows, columns].
The Syntax:
matrix[ start_row:end_row , start_col:end_col ]
Financial Context:
Imagine a matrix where:
Rows represent Days (Day 0, Day 1, Day 2).
Columns represent Stocks (Stock A, Stock B, Stock C).
# Create a 3×3 Matrix
data = np.array([
[100, 50, 25], # Day 0 prices for A, B, C
[101, 52, 26], # Day 1
[102, 51, 24] # Day 2
])
# Scenario A: Select specific STOCK (Column Slicing)
# We want all days (:) for Stock B (Index 1)
stock_b_prices = data[:, 1]
print(“Stock B Prices:”, stock_b_prices)
# Output: [50 52 51]
# Scenario B: Select specific DAY (Row Slicing)
# We want Day 0 (Index 0) for all stocks (:)
day_0_prices = data[0, :]
print(“Day 0 Prices:”, day_0_prices)
# Output: [100 50 25]
# Scenario C: A Sub-section
# First 2 days, First 2 stocks
sub_section = data[0:2, 0:2]
print(“Sub-section:n”, sub_section)
2.9 Boolean Indexing (Filtering)
Often, we don’t know the position (index) of the data we want; we only know the condition.
- “Show me returns that are negative.”
- “Find days where price > 100.”
This is called Boolean Indexing or Masking.
1. Creating the Mask
First, we ask a question. NumPy returns an array of True and False.
returns = np.array([0.05, –0.02, 0.03, –0.01, 0.04])
# Question: Which days were losses?
is_loss = returns < 0
print(is_loss)
# Output: [False True False True False]
2. Applying the Mask
We use the boolean array inside square brackets [] to filter the original data.
# Filter the returns array to keep only losses
losses = returns[is_loss]
print(“Losses:”, losses)
# Output: [-0.02 -0.01]
3. Outlier Detection
Financial Context:
In risk management, we often care about extreme events. Let’s filter for days where the movement was “significant” (more than +/- 3%).
daily_moves = np.array([0.01, –0.04, 0.005, 0.06, –0.01])
# We use np.abs() to check magnitude regardless of direction
extreme_mask = np.abs(daily_moves) > 0.03
significant_days = daily_moves[extreme_mask]
print(“Extreme Events:”, significant_days)
# Output: [-0.04 0.06]
Check Your Understanding
- Exercise 2.5 (Matrix Extraction):
You have a matrix prices of shape (5, 4) representing 5 days and 4 stocks.
Write code to extract the history of the last stock (the last column) for all days.
- Exercise 2.6 (Crisis Detector):
You have an array of returns: returns = np.array([-0.05, 0.02, 0.01, -0.10, 0.04]).
Write a script that:
a) Creates a mask for “Crash Days” (returns less than -0.04).
b) Uses the mask to print the specific values of those crash returns.
# Solution 2.5
# We simulate the data first
prices = np.random.randint(100, 200, size=(5, 4))
# Slice: All rows (:), Last column (-1)
last_stock = prices[:, –1]
print(“Last Stock History:”, last_stock)
# Solution 2.6
returns = np.array([-0.05, 0.02, 0.01, –0.10, 0.04])
# a) Create Mask
crash_mask = returns < –0.04
# b) Apply Mask
crash_values = returns[crash_mask]
print(“Crash Values:”, crash_values)
# Output should be [-0.05 -0.10]
2.10 Linear Algebra (The Dot Product)
In finance, the most common operation is the “Weighted Sum.”
- Portfolio Return: (Weight A × Return A) + (Weight B × Return B)…
- Moving Average: (Price T × 0.2) + (Price T-1 × 0.2)…
In math, this “sum of products” is called the Dot Product.
1. The Mathematical Concept
Given two vectors $A$ and $B$, the dot product is:
A⋅B=Ai×Bi
2. The NumPy Implementation (np.dot or @)
Instead of multiplying arrays and then summing them (two steps), we do it in one step.
Financial Context: Calculating Portfolio Return
Imagine a portfolio with 3 assets.
Weights: 50% Stock A, 30% Stock B, 20% Stock C.
Returns: A=+10%, B=+5%, C=-2%.
import numpy as np
weights = np.array([0.50, 0.30, 0.20])
returns = np.array([0.10, 0.05, –0.02])
# Method 1: The “Manual” Way (Element-wise multiplication + Sum)
# Good for understanding, but verbose.
weighted_sum = np.sum(weights * returns)
print(f“Manual Calculation: {weighted_sum}”)
# Method 2: The Dot Product (Preferred)
# Faster and cleaner syntax.
dot_prod = np.dot(weights, returns)
print(f“Dot Product: {dot_prod}”)
# Method 3: The ‘@’ Operator (Modern Python shortcut for dot product)
at_operator = weights @ returns
print(f“At Operator: {at_operator}”)
2.11 Matrix Multiplication
Financial Context:
To calculate Portfolio Risk (Variance), we cannot just multiply vectors. We need to interact a vector of weights ($w$) with a matrix of covariances (Σ).
This requires Matrix Multiplication.
Variance=wT⋅Σ⋅w
w: Weights (1D array)
Σ (Sigma): Covariance Matrix (2D array)
NumPy handles this complex algebra automatically using the @ operator or np.dot.
# 1. Define Weights (2 assets: 60% / 40%)
w = np.array([0.6, 0.4])
# 2. Define Covariance Matrix (Risk relationship between assets)
# Diagonal = Variance of individual assets
# Off-diagonal = Covariance between assets
cov_matrix = np.array([
[0.04, 0.01], # Asset A risk (0.04) and link to B (0.01)
[0.01, 0.09] # Link to A (0.01) and Asset B risk (0.09)
])
# 3. Calculate Portfolio Variance: w @ Cov @ w
# Step A: Multiply Weights by Matrix
step1 = w @ cov_matrix
# Step B: Multiply Result by Weights again
port_variance = step1 @ w
print(f“Portfolio Variance: {port_variance}”)
print(f“Portfolio Volatility (Std Dev): {np.sqrt(port_variance)}”)
2.12 Mini-Project: The Efficient Frontier Simulator
We will now combine Random Generation, Vectorization, and Linear Algebra to simulate 1,000 different portfolios. This allows us to visualize the trade-off between Risk and Return (The Efficient Frontier).
Objective:
Given 3 stocks with specific expected returns and a covariance matrix, simulate 1,000 random weight combinations and calculate the risk/return for each.
import numpy as np
# — Configuration —
n_assets = 3
n_portfolios = 1000
# Expected Returns for 3 stocks
mean_returns = np.array([0.12, 0.18, 0.15])
# Covariance Matrix (Risk)
cov_matrix = np.array([
[0.05, 0.02, 0.01],
[0.02, 0.08, 0.03],
[0.01, 0.03, 0.07]
])
# Arrays to store results
results_ret = []
results_vol = []
# — Simulation Loop —
print(“Simulating portfolios…”)
for i in range(n_portfolios):
# 1. Generate Random Weights
weights = np.random.random(n_assets)
weights = weights / np.sum(weights) # Normalize so they sum to 100%
# 2. Calculate Portfolio Return (Dot Product)
ret = weights @ mean_returns
results_ret.append(ret)
# 3. Calculate Portfolio Variance (Matrix Algebra)
# Formula: w * Cov * w
var = weights @ cov_matrix @ weights
vol = np.sqrt(var)
results_vol.append(vol)
# — Analysis —
# Convert lists to arrays for analysis
results_ret = np.array(results_ret)
results_vol = np.array(results_vol)
max_ret_idx = np.argmax(results_ret)
min_vol_idx = np.argmin(results_vol)
print(f“Simulation Complete.”)
print(f“Highest Return Found: {round(results_ret[max_ret_idx] * 100, 2)}% (Risk: {round(results_vol[max_ret_idx]*100, 2)}%)”)
print(f“Lowest Risk Found: {round(results_vol[min_vol_idx] * 100, 2)}% (Return: {round(results_ret[min_vol_idx]*100, 2)}%)”)
2.13 Chapter Summary
In this chapter, we replaced slow Python loops with fast NumPy arrays.
- Arrays: The core data structure for storing financial data (prices, returns).
- Vectorization: Applying math ($+$, $-$, $ln$) to entire arrays at once.
- Indexing: Slicing data by time (rows) or asset (columns).
- Boolean Masking: Filtering data based on conditions (e.g., returns < 0).
- Linear Algebra: Using the Dot Product (@) to calculate weighted returns and portfolio variance.
Coming Up Next:
Now that we can do the math, we need to handle the messy reality of financial data: dates, missing values, and CSV files. In Chapter 3, we introduce Pandas, the most popular tool in data science.
2.14 Review Questions
- Vectorization: How would you multiply every element in a NumPy array arr by 5?
a) arr.multiply(5)
b) arr * 5
c) for x in arr: x * 5
d) np.mult(arr, 5)
- Dimensions: If matrix.shape is (100, 50), what does this likely represent?
a) 100 stocks and 50 days.
b) 100 days and 50 stocks.
c) A single vector of length 5000.
- Linear Algebra: Which NumPy operator calculates the dot product (weighted sum)?
a) &
b) $
c) @
d) %
- Coding Challenge:
You have a portfolio with weights = np.array([0.2, 0.8]).
You have returns for today: ret = np.array([-0.05, 0.10]).
Calculate the portfolio return using the dot product.
Answers
1: (b) arr * 5
2: (b) It is standard convention to have Time as Rows (100 days) and Assets as Columns (50 stocks).
3: (c) @
4:
weights = np.array([0.2, 0.8])
ret = np.array([-0.05, 0.10])
port_ret = weights @ ret
# Calculation: (0.2 * -0.05) + (0.8 * 0.10) = -0.01 + 0.08 = 0.07 (7%)

