Preface

The whales do not sing because they have an answer, they sing because they have a song. - Gregory Colbert

Why does this book exist?

Like you, I am not a financial expert. But since people say technology is changing the world, you need to know what mTrade is changing, and you are part of it. So we will introduce the market, mTrade’s value proposition, and the technologies in the coming chapters.

The jargon can be daunting for newcomers to the finical world, but they don’t need to be. The truth is no finical expert can know everything in the mortgage space if they have not been part of it. I know you can google and Wikipedia these topics, but I hope this little book will save you the time to understand the market and mTrade as a whole. So you can hit the ground and run at no time.

Who should read this book?

I write this book with the following audiences who are new to the whole load trading in mind:

Developers
Business Analysts
Customer Support
Technical Product Managers

What will you learn?

The book is currently divided into 5 parts:

Chapter 1 introduces the mortgage market and its ecosystem in US.
Chapters 2-4 provide tools to organize your data and prepare the most common data sets used in financial research. Although many important data are behind paywalls, we start by describing different open source data and how to download them. We then move on to prepare the two most popular data in financial research: CRSP and Compustat. Then, we cover corporate bond data from TRACE. We reuse the data from these chapters in all subsequent chapters.
Chapters 5-10 deal with key concepts of empirical asset pricing such as beta estimation, portfolio sorts, performance analysis, and asset pricing regressions.
Chapters 11-13 apply linear models to panel data and machine learning methods to problems in factor selection and option pricing.
Chapters 14-15 provide approaches for parametric, constrained portfolio optimization, and backtesting procedures.

Each chapter is self-contained and can be read individually. Yet the data chapters provide important background necessary for the data management in all other chapters.

What won’t you learn?

This book is about empirical work. While we assume only basic knowledge in statistics and econometrics, we do not provide detailed treatments of the underlying theoretical models or methods applied in this book. Instead, you find references to the seminal academic work in journal articles or textbooks for more detailed treatments. We believe that our comparative advantage is to provide a thorough implementation of typical approaches such as portfolio sorts, backtesting procedures, regressions, machine learning methods, or other related topics in empirical finance. We enrich our implementations with discussions of the needy-greedy choices you face while conducting empirical analyses. We hence refrain from deriving theoretical models or extensively discussing the statistical properties of well-established tools.

Our book is close in spirit to other books that provide fully reproducible code for financial applications. We view them as complementary to our work and want to highlight the differences:

Why tidy?

As you start working with data, you quickly realize that you spend a lot of time reading, cleaning, and transforming your data. In fact, it is often said that more than 80% of data analysis is spent on preparing data. By tidying data, we want to structure data sets to facilitate further analyses. As (Wickham2014?) puts it:

[T]idy datasets are all alike, but every messy dataset is messy in its own way. Tidy datasets provide a standardized way to link the structure of a dataset (its physical layout) with its semantics (its meaning).

In its essence, tidy data follows these three principles:

Every column is a variable.
Every row is an observation.
Every cell is a single value.

Throughout this book, we try to follow these principles as best as we can. If you want to learn more about tidy data principles in an informal manner, we refer you to this vignette as part of (tidyr?).

In addition to the data layer, there are also tidy coding principles outlined in the tidy tools manifesto that we try to follow:

Reuse existing data structures.
Compose simple functions with the pipe.
Embrace functional programming.
Design for humans.

In particular, we heavily draw on a set of packages called the tidyverse (tidyverse?). The tidyverse is a consistent set of packages for all data analysis tasks, ranging from importing and wrangling to visualizing and modeling data with the same grammar. In addition to explicit tidy principles, the tidyverse has further benefits: (i) if you master one package, it is easier to master others, and (ii) the core packages are developed and maintained by the Public Benefit Company RStudio, Inc. These core packages contained in the tidyverse are: ggplot2 (ggplot2?), dplyr (dplyr?), tidyr (tidyr?), readr (readr?), purrr (purrr?), tibble (tibble?), stringr (stringr?), and forcats (forcats?).

Throughout the book we use the native pipe |>, a powerful tool to clearly express a sequence of operations. Readers familiar with the tidyverse may be used to the predecessor %>% that is part of the magrittr package. For all our applications, the native and magrittr pipe behave identically, so we opt for the one that is simpler and part of base R. For a more thorough discussion on the subtle differences between the two pipes, we refer to the second edition of (Wickham2016?).

Unofficial History of mTrade

You may know mTrade has root in a previous company called FNC, INC. Before the 2008 mortgage crisis, it was estimated that 60% of appraisals from the US ran through the FNC platform. When the market crashed, it took down many lenders, especially subprime lenders. Six of the top 10 sub-prime lenders were on FNC’s platform. After the crash, FNC remained a critical vendor in the origination market. During that period, FNC participated in creating UAD and almost became the technology provider of UCDP but lost the bid for a non-technology reason. FNC also bided on Freddie Mac’s Collateral Advisor engine and lost because we were “expensive.” Later when TRID was enforced by the CFPB, FNC created the “LoanPort” product to center around UCD, facilitating that process.

Around early 2010s, FNC expanded product line with two more applications: CMS.Duet and Clean Room. CMS.Duet focused on the appraisal due diligences in loan acquisition, and the Clean Room as the DE we known today. Finally, in 2016, FNC was acquired by CoreLogic. As a result, Mortgage Trade was incorporated.

Yes, I know there are many acronyms. No worry, we will get to know them at the proper time.

Colophon

This book was written in RStudio using bookdown. The website is hosted with GitHub Pages and automatically updated after every commit. The complete source is available from GitHub. We generated all plots in this book using ggplot2 and its classic dark-on-light theme (theme_bw()).

This version of the book was built with R version 4.2.1 (2022-06-23, Funny-Looking Kid) and the following packages:

mTrade Succiently for Non-Sales