Attribution analysis of portfolios typically aims to discover the impact that a portfolio manager’s investment choices and strategies had on overall profitability. They can help determine whether success was the result of an educated choice or simply good luck. Usually a benchmark is chosen and the portfolio’s performance is assessed relative to it.

This post, however, considers the question of whether a *non-referential* assessment is possible. That is, can we deconstruct and assess a portfolio’s performance without employing a benchmark? Such an analysis would require access to historical return as well as the portfolio’s weights and perhaps the volatility of interest rates, if some of the components exhibit a dependence on them. This list of required variables is by no means exhaustive.

There are two prevalent approaches to attribution analysis—one based on factor models and the other on return decomposition. The factor model approach considers the equities in a portfolio at a single point in time and attributes performance to various macro- and micro-economic factors prevalent at that time. The effects of these factors are aggregated at the portfolio level and a qualitative assessment is done. Return decomposition, on the other hand, explores the manner in which positive portfolio returns are achieved across time. The principal drivers of performance are separated and further analyzed. In addition to a year’s worth of time series data for the variables listed in the previous paragraph, covariance, correlation, and cluster analyses and other mathematical methods would likely be required.

## Normality Assumption

Is the normality assumption for stock returns fully justified? Are sample means and variances good proxies for population means and variances? This assumption is worth testing because Normality and the Central Limit Theorem are widely assumed when dealing with financial data. The Delta-Normal Value at Risk (VaR) method, which is widely used to compute portfolio VaR, assumes that stock returns and allied risk factors are normally distributed. Normality is also implicitly assumed in financial literature. Consider the distribution of S&P returns from May 1980 to May 2017 displayed in Figure 1.

#### Figure One: Distribution of S&P Returns

Panel (a) is a histogram of S&P daily returns from January 2001 to January 2017. The red curve is a Gaussian fit. Panel (b) shows the same data on a semi-log plot (logarithmic Y axis). The semi-log plot emphasizes the tail events.

The returns displayed in the left panel of figure 1 have a higher central peak and the “shoulders” are somewhat wider than what is predicted by the Gaussian fit. This mismatch in the tails is more visible in the semi-log plot shown in panel (b). This demonstrates that a normal distribution is probably not a very accurate assumption. Sigma, the standard deviation, is typically used as a measure of the relative magnitude of market moves and as a rough proxy for the occurrence of such events. The normal distribution places the odds of a minus-5 sigma swing at only 2.86×10-5 %. In other words, assuming 252 trading days per year, a drop of this magnitude should occur once in every 13,000 years! However, an examination of S&P returns over the 37-year period cited shows drops of 5 standard deviations or greater on 15 occasions. Assuming a normal distribution would consistently underestimate the occurrence of tail events.

We conducted a subsequent analysis focusing on the daily returns of SPY, a popular exchange-traded fund (ETF). This ETF tracks 503 component instruments. Using returns from July 01, 2016 through June 31, 2017, we tested each component instrument’s return vector for normality using the Chi-Square Test, the Kurtosis estimate, and a visual inspection of the Q-Q plot. Brief explanations of these methods are provided below.

### Chi-Square Test

This is a goodness-of-fit test that assumes a specific data distribution (Null hypothesis) and then tests that assumption. The test evaluates the deviations of the model predictions (Normal distribution, in this instance) from empirical values. If the resulting computed test statistic is large, then the observed and expected values are not close and the model is deemed a poor fit to the data. Thus, the Null hypothesis assumption of a specific distribution is rejected.

### Kurtosis

The kurtosis of any univariate standard-Normal distribution is 3. Any deviations from this value imply that the data distribution is correspondingly non-Normal. An example is illustrated in Figures 2, 3, and 4, below.

### Q-Q Plot

Quantile-quantile (QQ) plots are graphs on which quantiles from two distributions are plotted relative to each other. If the distributions correspond, then the plot appears linear. This is a visual assessment rather than a quantitative estimation. A sample set of results is shown in Figures 2, 3, and 4, below.

#### Figure Two: Year’s Returns for Exxon

**Figure 2.** The left panel shows the histogram of a year’s returns for Exxon (XOM). The null hypothesis was rejected with the conclusion that the data is not normally distributed. The kurtosis was 6 which implies a deviation from normality. The Q-Q plot in the right panel reinforces these conclusions.

#### Figure Three: Year’s Returns for Boeing

**Figure 3.** The left panel shows the histogram of a year’s returns for Boeing (BA). The data is not normally distributed and shows a significant skewness also. The kurtosis was 12.83 and implies a significant deviation from normality. The Q-Q plot in the right panel confirms this.

For the sake of comparison, we also show returns that exhibit normality in the next figure.

#### Figure Four: Year’s Returns for Xerox

The left panel shows the histogram of a year’s returns for Xerox (XRX). The data is normally distributed, which is apparent from a visual inspection of both panels. The kurtosis was 3.23 which is very close to the value for a theoretical normal distribution.

Machine learning literature has several suggestions for addressing this problem, including *Kernel Density Estimation* and *Mixture Density Networks*. If the data exhibits multi-modal behavior, learning a multi-modal mixture model is a possible approach.

## Stationarity Assumption

In addition to normality, we also make untested assumptions regarding stationarity. This critical assumption is implicit when computing covariances and correlations. We also tend to overlook insufficient sample sizes. As observed earlier, the SPY dataset we had at our disposal consisted of 503 instruments, with around 250 returns per instrument. The number of observations is much lower than the dimensionality of the data. This will produce a covariance matrix which is not full-rank and, consequently, its inverse will not exist. Singular covariance matrices are highly problematic when computing the risk-return efficiency loci in the analysis of portfolios. We tested the returns of all instruments for stationarity using the Augmented Dickey Fuller (ADF) test. Several return vectors were non-stationary. Non-stationarity and sample size issues can’t be wished away because the financial markets are fluid with new firms coming into existence and existing firms disappearing due bankruptcies or acquisitions. Consequently, limited financial histories will be encountered and must be dealt with.

This is a problem where machine learning can be profitably employed. *Shrinkage methods*, *Latent factor models*, *Empirical Bayes estimators* and *Random matrix theory* based models are widely published techniques that are applicable here.

## Portfolio Performance Analysis

Once issues surrounding untested assumptions have addressed, we can focus on portfolio performance analysis–a subject with a vast collection of books and papers devoted to it. We limit our attention here to one aspect of portfolio performance analysis – an inquiry into the clustering behavior of stocks in a portfolio.

Books on portfolio theory devote substantial space to the discussion of asset diversification to achieve an optimum balance of risk and return. To properly diversify assets, we need to know if resources have been over-allocated to a specific sector and, consequently, under-allocated to others. Cluster analysis can help to answer this. A pertinent question is how to best measure the difference or similarity between stocks. One way would be to estimate correlations between stocks. This approach has its own weaknesses, some of which have been discussed in earlier sections. Even if we had a statistically significant set of observations, we are faced with the problem of changing correlations during the course of a year due to structural and regime shifts caused by intermittent periods of stress. Even in the absence of stress, correlations can break down or change due to factors that are endogenous to individual stocks.

We can estimate similarity and visualize clusters using histogram analysis. However, histograms eliminate temporal information. To overcome this constraint, we used *Spectral Clustering*, which is a machine learning technique that explores cluster formation without neglecting temporal information.

Figures 5 to 7 display preliminary results from our cluster analysis. Analyses like this will enable portfolio managers to realize clustering patterns and their strengths in their portfolios. They will also help guide decisions on reweighting portfolio components and diversification.

#### Figures 5-7: Cluster Analyses

**Figure 5.** Cluster analysis of a limited set of stocks is shown here. The labels indicate the names of the firms. Clusters are illustrated by various colored bullets, and increasing distances indicate decreasing similarities. Within clusters, stronger affinities are indicated by greater connecting line weights.

The following figures display magnified views of individual clusters.

**Figure 6.** We can see that Procter & Gamble, Kimberly Clark and Colgate Palmolive form a cluster (top left, dark green bullets). Likewise, Bank of America, Wells Fargo and Goldman Sachs form a cluster (top right, light green bullets). This is not surprising as these two clusters represent two sectors: consumer products and banking. Line weights are correlated to affinities within sectors.

**Figure 7.** The cluster on the left displays stocks in the technology sector, while the clusters on the right represent firms in the defense industry (top) and the energy sector (bottom).

In this post, we raised questions about standard assumptions that are made when analyzing portfolios. We also suggested possible solutions from machine learning literature. We subsequently analyzed one year’s worth of returns of SPY to identify clusters and their strengths and discussed the value of such an analysis to portfolio managers in evaluating risk and reweighting or diversifying their portfolios.