Unlock Portfolio Insights Using NumPy Arrays

No of Post Views:

37 hits

Chapter 2

2.1 Learning Objectives

By the end of this chapter, you will be able to:

  1. Explain why standard Python lists are inefficient for large-scale financial data.
  2. Create and manipulate 1D (vectors) and 2D (matrices) NumPy arrays.
  3. Perform vectorized operations to calculate returns across thousands of assets simultaneously.
  4. Use Linear Algebra (dot products) to calculate portfolio variance and expected returns.

2.2 Introduction: The Need for Speed

In Chapter 1, we calculated the return of a stock using a for loop. While this works for a few data points, it fails at the institutional level.

The Problem:

Imagine analyzing the S&P 500 index.

  • 500 companies.
  • 20 years of daily data (~5,000 trading days).
  • Total data points: 500 × 5,000 = 2,500,000.

Looping through 2.5 million items in pure Python is slow.

The Solution: NumPy (Numerical Python)

NumPy provides a new data structure called the ndarray (N-dimensional array). Unlike Python lists, which are scattered in memory, NumPy arrays are stored in contiguous blocks of memory (like C or Fortran). This allows for Vectorization; applying a mathematical operation to an entire array at once, without writing a loop.

To use NumPy, we import it using the universal standard alias np.

import numpy as np

 

2.3 The NumPy Array

The core building block of quantitative finance is the Array.

1. Creating Arrays from Lists

We can convert a standard Python list into a NumPy array.

Financial Context:

1D Array (Vector): Represents a single time series (e.g., closing prices of AAPL).

2D Array (Matrix): Represents a panel of data (e.g., Rows = Dates, Columns = Different Stocks).

import numpy as np

 

# A list of returns for 3 days

returns_list = [0.01, 0.02, 0.01]

 

# Convert to NumPy Array

returns_arr = np.array(returns_list)

 

print(returns_arr)

# Output: [ 0.01 0.02 -0.01] (Notice no commas in output)

print(type(returns_arr))

# Output: <class ‘numpy.ndarray‘>

 

2. Array Attributes

When loading data, it is critical to know its “shape” (dimensions).

  • .ndim: Number of dimensions (1 for vector, 2 for matrix).
  • .shape: The size of each dimension (Rows, Columns).
  • .dtype: The data type (e.g., float64, int32).

# A 2D Matrix: 2 Stocks (Rows), 3 Days of Prices (Cols)

price_matrix = np.array([

[100, 101, 102], # Stock A

[50, 48, 49] # Stock B

])

 

print(f“Dimensions: {price_matrix.ndim}”) # 2

print(f“Shape: {price_matrix.shape}”) # (2, 3) -> 2 Rows, 3 Cols

print(f“Data Type: {price_matrix.dtype}”) # int64 (or int32)

 

int (Integer): Whole numbers without a fraction (e.g., 10, -5).

float (Floating Point): Numbers with a decimal point (e.g., 3.14, -0.001).

2.4 Generating Financial Data

Often in modeling, we need to generate data from scratch, such as creating a dummy portfolio or setting up a simulation grid. NumPy has specific functions for this.

1. np.zeros() and np.ones()

Useful for initializing weights in a portfolio. If we have an empty portfolio, we start with zeros.

# Create an array of 5 zeros (representing 0% allocation to 5 stocks)

weights = np.zeros(5)

print(weights)

# Output: [0. 0. 0. 0. 0.]

 

2. np.arange() (Array Range)

Similar to Python’s range() but returns an array. Useful for generating time indices (e.g., “Day 1 to Day 10”).

days = np.arange(1, 11) # Start at 1, stop before 11

print(days)

# Output: [ 1 2 3 4 5 6 7 8 9 10]

 

3. np.linspace() (Linear Space)

Crucial for Sensitivity Analysis.

Financial Context:

You want to price a bond under different yield assumptions. You need 5 scenarios ranging from 1% to 5%. linspace generates evenly spaced numbers.

# Generate 5 rates between 0.01 and 0.05

rates = np.linspace(0.01, 0.05, 5)

 

print(rates)

# Output: [0.01 0.02 0.03 0.04 0.05]

 

4. np.random (Random Number Generation)

We used random in Chapter 1. NumPy’s random module is much faster and can generate entire matrices at once. This is the engine for Monte Carlo simulations.

# Generate 10 random daily returns (Mean=0, Std Dev=1) – Standard Normal

random_returns = np.random.normal(0, 1, 10)

print(random_returns)

 

Check Your Understanding

  • Exercise 2.1 (Shape Analysis):

Create a 2D array representing the prices of 3 stocks over 5 days. Use random integers between 100 and 200 to fill it. Print the .shape of the array. Hint: Use np.random.randint().

  • Exercise 2.2 (Scenario Generation):

You are stress-testing a portfolio. You need to test how it performs if the market drops by -10%, -20%, -30%, -40%, and -50%. Use np.linspace or np.arange to generate an array of these 5 stress scenarios (as decimals: -0.10 to -0.50).

# Solution 2.1

# 3 Stocks (Rows), 5 Days (Cols)

stock_data = np.random.randint(100, 200, size=(3, 5))

print(“Stock Data:n”, stock_data)

print(“Shape:”, stock_data.shape) # Should be (3, 5)

 

# Solution 2.2

# Using linspace to get exactly 5 points

stress_scenarios = np.linspace(-0.10, 0.50, 5)

print(“Stress Scenarios:”, stress_scenarios)

 

2.5 Vectorized Operations

In Chapter 1, if we wanted to add $10 to a list of stock prices, we had to write a loop.

In NumPy, we simply add 10 to the array. This is called Vectorization.

1. Element-Wise Arithmetic

Vectorization means applying an operation to an entire array at once. NumPy handles the looping internally in optimized C-code, making it 50-100x faster than Python loops.

Comparison: The “Loop” way vs. The “NumPy” way

Financial Context:

We have prices for 3 stocks, and we want to apply a $1.50 dividend adjustment to all of them.

import numpy as np

 

# Standard Python List approach (Slow & Verbose)

prices_list = [100.0, 50.0, 25.0]

adjusted_list = []

for p in prices_list:

adjusted_list.append(p + 1.50)

 

# NumPy Approach (Fast & Clean)

prices_arr = np.array([100.0, 50.0, 25.0])

adjusted_arr = prices_arr + 1.50 # <– Vectorization in action!

 

print(adjusted_arr)

# Output: [101.5 51.5 26.5]

 

2. Operations Between Arrays

We can also perform math between two arrays. This works element-by-element.

Condition: The arrays must usually be the same shape (e.g., both are length 3).

Financial Context: Calculating Daily P&L

We have the number of shares held and the price change for a portfolio of 3 stocks.

P&L=Shares×Price Close-Price Open

shares = np.array([100, 50, 200])

price_open = np.array([150.0, 200.0, 50.0])

price_close = np.array([155.0, 198.0, 52.0])

 

# Step 1: Calculate Price Change (Vector Subtraction)

price_change = price_close price_open

# Result: [5.0, -2.0, 2.0]

 

# Step 2: Calculate Profit (Vector Multiplication)

daily_profit = shares * price_change

 

print(f“Profit per stock: {daily_profit}”)

# Output: [ 500. -100. 400.]

 

2.6 Universal Functions (ufuncs)

Finance involves more than just + and -. We need logarithms for returns, square roots for volatility, and exponents for compounding. NumPy provides Universal Functions (ufuncs) for this.

1. Log Returns (np.log)

Financial Context:

The standard for measuring returns in quantitative finance is the natural logarithm.

rlog=lnPt-lnPt-1

prices = np.array([100.0, 102.0, 105.0, 103.0])

 

# To calculate log returns, we can simply take the log of the whole array

log_prices = np.log(prices)

 

print(log_prices)

# Output: [4.605… 4.624… 4.653… 4.634…]

 

2. Square Root (np.sqrt)

Financial Context:

Used for volatility scaling.

variance = 0.04

volatility = np.sqrt(variance)

print(volatility) # 0.2

 

Volatility scaling is a technique used in quantitative finance to adjust the exposure of an investment or a time-series estimate based on risk. It typically refers to two distinct but related concepts: scaling across time and scaling for position sizing.

1. Scaling Across Time (The Square Root Rule)

In risk management, you often need to convert volatility from one timeframe to another (e.g., daily to annual). Under the assumption that returns are independent and identically distributed (i.i.d.), volatility scales with the square root of time.

σT=σ1×T

Example: To annualize a daily volatility of 1%, you multiply it by the square root of the number of trading days in a year (typically 252).

1%×252≈15.87%

2. Volatility Targeting (Position Sizing)

This is a strategic application where a portfolio manager adjusts leverage to maintain a constant level of risk. The goal is to ensure that a strategy contributes the same amount of risk regardless of whether the market is calm or turbulent.

High Volatility: Reduce position size (de-leverage).

Low Volatility: Increase position size (leverage up).

The Weighting Formula:

The weight ($W_t$) assigned to an asset is inversely proportional to its forecasted volatility ($sigma_t$):

$$W_t = frac{text{Target Volatility}}{sigma_t}$$

3. Absolute Value (np.abs)

Financial Context:

Useful for calculating “tracking error” or deviation magnitude without worrying about direction (positive/negative).

returns = np.array([-0.05, 0.03, 0.01])

magnitudes = np.abs(returns)

print(magnitudes)

# Output: [0.05 0.03 0.01]

 

2.7 Aggregation and Axes

Once we have calculated returns for 500 stocks, we usually want to summarize them: “What was the average return?” or “What was the maximum loss?”

1. Simple Aggregations (sum, mean, min, max, std)

portfolio_returns = np.array([0.05, 0.02, 0.01, 0.03])

 

print(f“Total Return: {np.sum(portfolio_returns)}”)

print(f“Average Return: {np.mean(portfolio_returns)}”)

print(f“Volatility (Std Dev): {np.std(portfolio_returns)}”)

print(f“Max Loss: {np.min(portfolio_returns)}”)

 

2. The axis Parameter (Critical for 2D Data)

When working with a 2D Matrix (Rows=Time, Cols=Stocks), we can aggregate in two directions:

  • axis=0: “Collapse the rows.” (Calculate down the column). Used to find the average return for each stock over time.
  • axis=1: “Collapse the columns.” (Calculate across the row). Used to find the average return of the entire portfolio on a specific day.

Financial Context:

We have a matrix of returns for 2 stocks over 3 days.

# 3 Days (Rows), 2 Stocks (Cols)

returns_matrix = np.array([

[0.01, 0.05], # Day 1: Stock A=1%, Stock B=5%

[0.02, 0.01], # Day 2: Stock A=2%, Stock B=-1%

[0.01, 0.02] # Day 3: Stock A=1%, Stock B=2%

])

 

# Scenario A: What is the average return for EACH STOCK? (Down the columns)

avg_stock_returns = np.mean(returns_matrix, axis=0)

print(“Average per Stock:”, avg_stock_returns)

# Output: [0.0133… 0.02] (Stock A avg, Stock B avg)

 

# Scenario B: What is the DAILY performance of the portfolio? (Across the rows)

daily_portfolio_return = np.mean(returns_matrix, axis=1)

print(“Daily Portfolio Performance:”, daily_portfolio_return)

# Output: [0.03 0.005 0.015] (Day 1 avg, Day 2 avg, Day 3 avg)

 

Check Your Understanding

  • Exercise 2.3 (Vectorized Impact):

You have an array of prices: prices = np.array([100, 102, 104, 106]).

A market crash occurs, and all prices drop by 20%. Write one line of code to create a new array crashed_prices.

  • Exercise 2.4 (Filtering with Logic):

Bonus Concept: You can filter arrays using conditions like prices[prices > 100].

Given returns = np.array([0.05, -0.02, 0.03, -0.04, 0.01]), write code to:

a) Create a boolean mask identifying all negative returns.

b) Calculate the mean of only the negative returns (Average Loss).

# Solution 2.3

prices = np.array([100, 102, 104, 106])

crashed_prices = prices * (1 0.20)

print(“Crashed:”, crashed_prices)

 

# Solution 2.4

returns = np.array([0.05, 0.02, 0.03, 0.04, 0.01])

 

# a) Boolean Mask

negative_mask = returns < 0

print(“Negative Days:”, negative_mask) # [False, True, False, True, False]

 

# b) Filter and Mean

negative_returns = returns[negative_mask]

average_loss = np.mean(negative_returns)

print(“Average Loss:”, average_loss) # -0.03

 

2.8 Indexing and Slicing

In financial data analysis, you rarely use the entire dataset at once. You might want the “first 30 days,” “only the tech stocks,” or “just the closing prices.” NumPy offers powerful tools to slice data efficiently.

1. 1D Slicing (Time Series)

Slicing a 1D array works exactly like slicing a Python list: [start:stop:step].

Financial Context:

We have 10 days of stock prices. We want to analyze the first week (first 5 days) and the trend of the last 2 days.

import numpy as np

 

prices = np.array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])

 

# Select first 5 days (Indices 0, 1, 2, 3, 4)

first_week = prices[0:5]

print(“First Week:”, first_week)

 

# Select the last 2 days (Indices -2 to end)

recent_trend = prices[-2:]

print(“Recent Trend:”, recent_trend)

 

2. 2D Slicing (The Data Matrix)

This is where NumPy shines. When you have a 2D matrix (Rows=Time, Columns=Stocks), you slice using a comma to separate the dimensions: [rows, columns].

The Syntax:

matrix[ start_row:end_row , start_col:end_col ]

Financial Context:

Imagine a matrix where:

Rows represent Days (Day 0, Day 1, Day 2).

Columns represent Stocks (Stock A, Stock B, Stock C).

# Create a 3×3 Matrix

data = np.array([

[100, 50, 25], # Day 0 prices for A, B, C

[101, 52, 26], # Day 1

[102, 51, 24] # Day 2

])

 

# Scenario A: Select specific STOCK (Column Slicing)

# We want all days (:) for Stock B (Index 1)

stock_b_prices = data[:, 1]

print(“Stock B Prices:”, stock_b_prices)

# Output: [50 52 51]

 

# Scenario B: Select specific DAY (Row Slicing)

# We want Day 0 (Index 0) for all stocks (:)

day_0_prices = data[0, :]

print(“Day 0 Prices:”, day_0_prices)

# Output: [100 50 25]

 

# Scenario C: A Sub-section

# First 2 days, First 2 stocks

sub_section = data[0:2, 0:2]

print(“Sub-section:n”, sub_section)

 

2.9 Boolean Indexing (Filtering)

Often, we don’t know the position (index) of the data we want; we only know the condition.

  1. “Show me returns that are negative.”
  2. “Find days where price > 100.”

This is called Boolean Indexing or Masking.

1. Creating the Mask

First, we ask a question. NumPy returns an array of True and False.

returns = np.array([0.05, 0.02, 0.03, 0.01, 0.04])

 

# Question: Which days were losses?

is_loss = returns < 0

 

print(is_loss)

# Output: [False True False True False]

 

2. Applying the Mask

We use the boolean array inside square brackets [] to filter the original data.

# Filter the returns array to keep only losses

losses = returns[is_loss]

 

print(“Losses:”, losses)

# Output: [-0.02 -0.01]

 

3. Outlier Detection

Financial Context:

In risk management, we often care about extreme events. Let’s filter for days where the movement was “significant” (more than +/- 3%).

daily_moves = np.array([0.01, 0.04, 0.005, 0.06, 0.01])

 

# We use np.abs() to check magnitude regardless of direction

extreme_mask = np.abs(daily_moves) > 0.03

 

significant_days = daily_moves[extreme_mask]

print(“Extreme Events:”, significant_days)

# Output: [-0.04 0.06]

 

Check Your Understanding

  • Exercise 2.5 (Matrix Extraction):

You have a matrix prices of shape (5, 4) representing 5 days and 4 stocks.

Write code to extract the history of the last stock (the last column) for all days.

  • Exercise 2.6 (Crisis Detector):

You have an array of returns: returns = np.array([-0.05, 0.02, 0.01, -0.10, 0.04]).

Write a script that:

a) Creates a mask for “Crash Days” (returns less than -0.04).

b) Uses the mask to print the specific values of those crash returns.

# Solution 2.5

# We simulate the data first

prices = np.random.randint(100, 200, size=(5, 4))

# Slice: All rows (:), Last column (-1)

last_stock = prices[:, 1]

print(“Last Stock History:”, last_stock)

 

# Solution 2.6

returns = np.array([-0.05, 0.02, 0.01, 0.10, 0.04])

 

# a) Create Mask

crash_mask = returns < 0.04

 

# b) Apply Mask

crash_values = returns[crash_mask]

print(“Crash Values:”, crash_values)

# Output should be [-0.05 -0.10]

 

2.10 Linear Algebra (The Dot Product)

In finance, the most common operation is the “Weighted Sum.”

  • Portfolio Return: (Weight A × Return A) + (Weight B × Return B)…
  • Moving Average: (Price T × 0.2) + (Price T-1 × 0.2)…

In math, this “sum of products” is called the Dot Product.

1. The Mathematical Concept

Given two vectors $A$ and $B$, the dot product is:

A⋅B=Ai×Bi

2. The NumPy Implementation (np.dot or @)

Instead of multiplying arrays and then summing them (two steps), we do it in one step.

Financial Context: Calculating Portfolio Return

Imagine a portfolio with 3 assets.

Weights: 50% Stock A, 30% Stock B, 20% Stock C.

Returns: A=+10%, B=+5%, C=-2%.

import numpy as np

 

weights = np.array([0.50, 0.30, 0.20])

returns = np.array([0.10, 0.05, 0.02])

 

# Method 1: The “Manual” Way (Element-wise multiplication + Sum)

# Good for understanding, but verbose.

weighted_sum = np.sum(weights * returns)

print(f“Manual Calculation: {weighted_sum}”)

 

# Method 2: The Dot Product (Preferred)

# Faster and cleaner syntax.

dot_prod = np.dot(weights, returns)

print(f“Dot Product: {dot_prod}”)

 

# Method 3: The ‘@’ Operator (Modern Python shortcut for dot product)

at_operator = weights @ returns

print(f“At Operator: {at_operator}”)

  

2.11 Matrix Multiplication

Financial Context:

To calculate Portfolio Risk (Variance), we cannot just multiply vectors. We need to interact a vector of weights ($w$) with a matrix of covariances (Σ).

This requires Matrix Multiplication.

Variance=wT⋅Σ⋅w

w: Weights (1D array)

Σ (Sigma): Covariance Matrix (2D array)

NumPy handles this complex algebra automatically using the @ operator or np.dot.

# 1. Define Weights (2 assets: 60% / 40%)

w = np.array([0.6, 0.4])

 

# 2. Define Covariance Matrix (Risk relationship between assets)

# Diagonal = Variance of individual assets

# Off-diagonal = Covariance between assets

cov_matrix = np.array([

[0.04, 0.01], # Asset A risk (0.04) and link to B (0.01)

[0.01, 0.09] # Link to A (0.01) and Asset B risk (0.09)

])

 

# 3. Calculate Portfolio Variance: w @ Cov @ w

# Step A: Multiply Weights by Matrix

step1 = w @ cov_matrix

 

# Step B: Multiply Result by Weights again

port_variance = step1 @ w

 

print(f“Portfolio Variance: {port_variance}”)

print(f“Portfolio Volatility (Std Dev): {np.sqrt(port_variance)}”)

 

2.12 Mini-Project: The Efficient Frontier Simulator

We will now combine Random Generation, Vectorization, and Linear Algebra to simulate 1,000 different portfolios. This allows us to visualize the trade-off between Risk and Return (The Efficient Frontier).

Objective:

Given 3 stocks with specific expected returns and a covariance matrix, simulate 1,000 random weight combinations and calculate the risk/return for each.

import numpy as np

 

# — Configuration —

n_assets = 3

n_portfolios = 1000

 

# Expected Returns for 3 stocks

mean_returns = np.array([0.12, 0.18, 0.15])

 

# Covariance Matrix (Risk)

cov_matrix = np.array([

[0.05, 0.02, 0.01],

[0.02, 0.08, 0.03],

[0.01, 0.03, 0.07]

])

 

# Arrays to store results

results_ret = []

results_vol = []

 

# — Simulation Loop —

print(“Simulating portfolios…”)

 

for i in range(n_portfolios):

# 1. Generate Random Weights

weights = np.random.random(n_assets)

weights = weights / np.sum(weights) # Normalize so they sum to 100%

# 2. Calculate Portfolio Return (Dot Product)

ret = weights @ mean_returns

results_ret.append(ret)

# 3. Calculate Portfolio Variance (Matrix Algebra)

# Formula: w * Cov * w

var = weights @ cov_matrix @ weights

vol = np.sqrt(var)

results_vol.append(vol)

 

# — Analysis —

# Convert lists to arrays for analysis

results_ret = np.array(results_ret)

results_vol = np.array(results_vol)

 

max_ret_idx = np.argmax(results_ret)

min_vol_idx = np.argmin(results_vol)

 

print(f“Simulation Complete.”)

print(f“Highest Return Found: {round(results_ret[max_ret_idx] * 100, 2)}% (Risk: {round(results_vol[max_ret_idx]*100, 2)}%)”)

print(f“Lowest Risk Found: {round(results_vol[min_vol_idx] * 100, 2)}% (Return: {round(results_ret[min_vol_idx]*100, 2)}%)”)

 

2.13 Chapter Summary

In this chapter, we replaced slow Python loops with fast NumPy arrays.

  • Arrays: The core data structure for storing financial data (prices, returns).
  • Vectorization: Applying math ($+$, $-$, $ln$) to entire arrays at once.
  • Indexing: Slicing data by time (rows) or asset (columns).
  • Boolean Masking: Filtering data based on conditions (e.g., returns < 0).
  • Linear Algebra: Using the Dot Product (@) to calculate weighted returns and portfolio variance.

Coming Up Next:

Now that we can do the math, we need to handle the messy reality of financial data: dates, missing values, and CSV files. In Chapter 3, we introduce Pandas, the most popular tool in data science.

2.14 Review Questions

  • Vectorization: How would you multiply every element in a NumPy array arr by 5?

a) arr.multiply(5)

b) arr * 5

c) for x in arr: x * 5

d) np.mult(arr, 5)

  • Dimensions: If matrix.shape is (100, 50), what does this likely represent?

a) 100 stocks and 50 days.

b) 100 days and 50 stocks.

c) A single vector of length 5000.

  • Linear Algebra: Which NumPy operator calculates the dot product (weighted sum)?

a) &

b) $

c) @

d) %

  • Coding Challenge:

You have a portfolio with weights = np.array([0.2, 0.8]).

You have returns for today: ret = np.array([-0.05, 0.10]).

Calculate the portfolio return using the dot product.

Answers

1: (b) arr * 5

2: (b) It is standard convention to have Time as Rows (100 days) and Assets as Columns (50 stocks).

3: (c) @

4:

weights = np.array([0.2, 0.8])

ret = np.array([-0.05, 0.10])

port_ret = weights @ ret

# Calculation: (0.2 * -0.05) + (0.8 * 0.10) = -0.01 + 0.08 = 0.07 (7%)

 


Leave a Reply

Discover more from SimplifiedZone

Subscribe now to keep reading and get access to the full archive.

Continue reading

Discover more from SimplifiedZone

Subscribe now to keep reading and get access to the full archive.

Continue reading