Big Data in Small Dimensions: Machine Learning Methods for Data Visualization

Analysts and data scientists are constantly seeking new ways to parse increasingly intricate datasets, many of which are deemed “high dimensional”, i.e., contain many (sometimes hundreds or more) individual variables. Machine learning has recently emerged as one such technique due to its exceptional ability to process massive quantities of data. A particularly useful machine learning...

Tuning Machine Learning Models

Tuning is the process of maximizing a model’s performance without overfitting or creating too high of a variance. In machine learning, this is accomplished by selecting appropriate “hyperparameters.” Hyperparameters can be thought of as the “dials” or “knobs” of a machine learning model. Choosing an appropriate set of hyperparameters is crucial for model accuracy, but...

Evaluating Supervised and Unsupervised Learning Models

Model evaluation is the process of objectively measuring how well machine learning models perform the specific tasks they were designed to do—such as predicting a stock price or appropriately flagging credit card transactions as fraud. Because each machine learning model is unique, optimal methods of evaluation vary depending on whether the model in question is “supervised” or “unsupervised.” Supervised machine learning models make specific predictions or classifications based on labeled training data, while unsupervised machine learning models seek to cluster or otherwise find patterns in unlabeled data.

Feature Selection – Machine Learning Methods

Feature selection in machine learning refers to the process of isolating only those variables (or “features”) in a dataset that are pertinent to the analysis. Failure to do this effectively has many drawbacks, including: 1) unnecessarily complex models with difficult-to-interpret outcomes, 2) longer computing time, and 3) collinearity and overfitting. Effective feature selection eliminates redundant variables and keeps only the best subset of predictors in the model, thus making it possible to represent the data in the simplest way.

Machine Learning and Portfolio Performance Analysis

Attribution analysis of portfolios typically aims to discover the impact that a portfolio manager’s investment choices and strategies had on overall profitability. They can help determine whether success was the result of an educated choice or simply good luck. Usually a benchmark is chosen and the portfolio’s performance is assessed relative to it. This post, however, considers the question of whether a non-referential assessment is possible. That is, can we deconstruct and assess a portfolio’s performance without employing a benchmark? Such an analysis would require access to historical return as well as the portfolio’s weights and perhaps the volatility of interest rates, if some of the components exhibit a dependence on them. This list of required variables is by no means exhaustive.

An Introduction to Machine Learning

There are two main challenges when implementing a machine learning solution: building a model that performs well and effectively leveraging the results. Having a good understanding of the machine learning process and model being used is key to tackling both issues. Using a predictive model without appropriately understanding it can substantially increase risk and lead to missed opportunities. If the performance of a model is unclear, misunderstood, or overestimated then subsequent decisions will be biased or outright wrong. Likewise, if the ability of a model is underestimated then its use will not be optimized.

Harnessing Machine Learning in Finance to Improve Model Results

Models based on Machine Learning are being increasingly adopted by the finance community in general and the mortgage market in particular. The use of modeling and data analytics has been key in the turnaround of this market; however, anyone who has worked with mortgage loan data knows it is notorious for errors and data gaps. Despite industry-wide efforts to incorporate robust quality control programs, challenges with mortgage data persist. Fortunately, combining machine learning in finance with cloud computing shows promise in addressing mortgage data gaps and producing more accurate results than traditional approaches.