From Numbers to Insight: Visualizing and Analyzing Financial Data

No of Post Views:

45 hits

Welcome to the final article in our foundational series, Python for Quants. We’ve come a long way: we set up our environment, learned to think about code like a quant, and mastered the data handling power of Pandas.

Today, we bring it all together. Our goal is to perform Exploratory Data Analysis (EDA) on our financial data. This is the process where we move from just holding data to understanding it. We’ll calculate key financial metrics, analyze relationships between assets, and create the insightful visualizations that are the hallmark of a professional quant.

From Prices to Returns: The First Step of Analysis

Absolute price is interesting, but for most financial analysis, we care about returns. Returns normalize the data and are the primary input for risk and strategy models. The most common type is the daily percentage change. Pandas makes this incredibly easy.

Let’s fetch data for a few tech stocks and the SPY ETF (which tracks the S&P 500) to analyze their relationships.

import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
import seaborn as sns

# Define tickers and date range
tickers = ['AAPL', 'MSFT', 'GOOG', 'SPY']
start_date = '2022-01-01'
end_date = '2023-12-31'

# Fetch adjusted close prices
adj_close_df = yf.download(tickers, start=start_date, end=end_date)['Close']

# Calculate daily percentage returns using the built-in pct_change() method
returns_df = adj_close_df.pct_change().dropna()

print("Adjusted Close Prices (last 5 days):")
print(adj_close_df.tail())

print("nDaily Returns (last 5 days):")
print(returns_df.tail())

Step-by-Step Explanation:

  1. We download the ‘Adj Close’ price for our list of tickers. Pandas handles the multiple tickers gracefully, creating a DataFrame with one column per stock.
  2. .pct_change() calculates the percentage change from the previous row for each column. This one command saves us from writing a manual loop.
  3. .dropna() removes the first row of the returns DataFrame, which will be NaN since there’s no prior day to calculate a return from.

Expected Output:

Adjusted Close Prices (last 5 days):
[*********************100%***********************]  4 of 4 completed
                  AAPL        GOOG        MSFT         SPY
Date
2023-12-22  193.600006  142.720001  374.579987  473.649994
2023-12-26  193.050003  142.820007  374.660004  475.640015
2023-12-27  193.149994  141.490005  374.070007  476.510010
2023-12-28  193.580002  141.279999  375.279999  476.679993
2023-12-29  192.529999  140.929993  376.040009  475.309998

Daily Returns (last 5 days):
                AAPL      GOOG      MSFT       SPY
Date
2023-12-22  0.005505  0.007409  0.002248  0.001938
2023-12-26 -0.002841  0.000701  0.000214  0.004199
2023-12-27  0.000518 -0.009312 -0.001575  0.001827
2023-12-28  0.002226 -0.001484  0.003235  0.000357
2023-12-29 -0.005424 -0.002477  0.002025 -0.002874

Visualizing Relationships: The Correlation Heatmap

How do these assets move in relation to one another? Correlation is the statistical measure for this, and it’s the foundation of modern portfolio theory. A heatmap is the most intuitive way to visualize a correlation matrix.

We’ll use Seaborn, a high-level plotting library built on Matplotlib, which makes statistical plots like this a breeze.

# Calculate the correlation matrix
correlation_matrix = returns_df.corr()

# Plot the heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Matrix of Daily Returns', fontsize=16)
plt.show()

Step-by-Step Explanation:

  1. returns_df.corr() calculates the pairwise correlation of columns, returning a new DataFrame (the correlation matrix).
  2. sns.heatmap(...) is the Seaborn function to create the plot.
  3. annot=True displays the numerical correlation values on the map.
  4. cmap='coolwarm' sets the color scheme, where hot colors (red) mean high positive correlation and cool colors (blue) mean low or negative correlation.

Expected Output:

[Image: The correlation heatmap showing high correlation between the tech stocks and between each stock and SPY.]

From the heatmap, we can instantly see that the tech stocks are all highly correlated with each other (values > 0.6) and with the broader market (SPY). This tells a quant that holding only these three stocks doesn’t provide much diversification.

Visualizing Trends: Rolling Statistics

To smooth out short-term price volatility and identify underlying trends, quants use rolling statistics. The most common is the Simple Moving Average (SMA). Let’s plot the 50-day and 200-day SMAs for Microsoft, a classic technical indicator setup.

# Isolate MSFT's adjusted close price
msft_price = adj_close_df['MSFT']

# Calculate 50-day and 200-day SMAs
sma_50 = msft_price.rolling(window=50).mean()
sma_200 = msft_price.rolling(window=200).mean()

# Plot the price and the moving averages
plt.figure(figsize=(14, 7))
plt.plot(msft_price, label='MSFT Adj Close', color='skyblue', alpha=0.8)
plt.plot(sma_50, label='50-Day SMA', color='orange', linestyle='--')
plt.plot(sma_200, label='200-Day SMA', color='red', linestyle='--')
plt.title('MSFT Price and Moving Averages', fontsize=16)
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.legend()
plt.show()

Step-by-Step Explanation:

  1. .rolling(window=50) creates a rolling window object. It doesn’t calculate anything on its own; it just defines the window size (50 days in this case).
  2. .mean() is the aggregation function we apply to the window. So, for each day, it calculates the average price of the preceding 50 days.
  3. We then plot the original price along with both SMAs.

Expected Output:

[Image: A line chart of MSFT’s stock price with the 50-day and 200-day moving average lines overlaid.]

The points where the short-term average (50-day) crosses the long-term average (200-day) are often interpreted by technical analysts as significant trading signals (a “Golden Cross” or “Death Cross”).

The code above calculates Simple Moving Averages. Another popular type is the Exponential Moving Average (EMA), which gives more weight to recent prices. Pandas has a method for this: .ewm(). Can you modify the code to plot the 50-day and 200-day EMAs instead? Share your code and plot in the comments!

Conclusion and Your Next Steps as a Quant

This is a huge milestone. You have successfully completed the entire quantitative analysis workflow: from setting up your environment and learning the language to fetching, cleaning, analyzing, and visualizing real financial data. You now possess the core, practical Python skills required of any junior quant.

But this is just the beginning of your journey. Where do you go from here?

  • Algorithmic Trading: Use a library like backtrader or vectorbt to backtest the moving average crossover strategy we visualized.
  • Portfolio Optimization: Use the expected returns and the covariance matrix (the numerical version of our correlation matrix) to find the “optimal” portfolio allocation using Modern Portfolio Theory.
  • Financial Machine Learning: Apply machine learning models from scikit-learn to try and predict the direction of the next day’s return.

The world of quantitative finance is vast and exciting. You now have the foundational key to unlock it. Keep experimenting, keep learning, and keep coding.

Grab a copy of my eBook:


One response to “From Numbers to Insight: Visualizing and Analyzing Financial Data”

  1. […] Next Up: Part 4: From Numbers to Insight: Visualizing and Analyzing Financial Data […]

Leave a Reply to The Quant’s Workhorse: Mastering Data Analysis with NumPy and Pandas – SimplifiedZoneCancel reply

Discover more from SimplifiedZone

Subscribe now to keep reading and get access to the full archive.

Continue reading

Discover more from SimplifiedZone

Subscribe now to keep reading and get access to the full archive.

Continue reading