Book List

This is a non-exhaustive and mostly unordered list of books and other courses I have read/taken and some general thoughts about them.

Books and Courses

Title Author(s) Finished Link
Introduction to Statistical Learning James et al. Read python R
Elements of Statistical Learning Hastie et al. Read here
Bayesian Data Analasis, 3rd Edition Gelman et al. Read here
A First Course in Bayesian Statistical Methods Peter D. Hoff 200 pages in here
Advances in Financial Machine Learning Marco Lopez de Prado skimmed through here
Machine Learning for Asset Managers Marco Lopez de Prado Read here
Testing and Tuning Market Trading Systems Timothy Masters Read here
The Simple And Infinite Joy Of Mathematical Statistics J.N. Corcoran Read here
Automate the boring stuff with Python Al Sweigart Read here
Learn Python 3 the hard way Zed Read here
Python Crash Course Eric Matthes Read here
Python for Data Science and Machine Learning Jose Portilla Completed here
Coursera: C for everyone Ira Pohl Completed here
Effective C Robert C. Seacord On List here
Pikuma: C++ Game Engine Programming Gustavo Pezzi Completed here
Pikuma: 3D Computer Graphics Programming Gustavo Pezzi On list here
Pikuma: 2D Game Physics Programming Gustavo Pezzi On list here
Leetcode N/A N/A here
Game Programming Patterns Robert Nystrom 100 pages in here
Computer Enhance Casey Muratory On list here
Coursera: Supervised Machine Learning: Regression and Classification Andrew Ng Completed here
Operating Systems: Three Easy Pieces Arpaci-Dusseau & Arpaci-Dusseau On List here
Engineering a Compiler Cooper & Torczon On List here
Structure and Interpretation of Computer Programs (SICP) (Wizard Book) Abelson, Sussman, & Sussman On List here
Programming: Principles and Practice using C++ Stroustrup 150 pages in here
Machine Learning with PyTorch and Scikit-Learn Raschka et al. 200 pages in here
Forecasting: Principles and Practice Hyndman and Athanasopoulos On List here
Data-driven science and engineering Brunton and Kutz On List here
Statistical Inference Casella and Berger 200 pages in here
Computer Age Statistical Inference Efron and Hastie On List here
Causal Inference for the Brave and the True Facure Mostly read here
Minimize Regret Tim Radtke Skimmed through here
The Missing Semester of Your CS Education Various Completed here
Probabilistic Machine Learning Murphy On List here
Econometrics Hayashi On List here
Mastering ‘Metrics Angrist and Pischke Read here

Thoughts

If a book or course doesn’t have a section, it’s because I have nothing to say about it for now (usually because I haven’t started/finished it yet).

Introduction to Statistical Learning (ISL)

An excellent introduction to data science with lots of practical example. Suitable for data analysts and business analysts, or non-techincal people who want to see what happens under the hood a bit.

Elements of Statistical Learning (ESL)

If you want more solid statistical foundations for what you learned in ISL, this is the book. If you master what’s in there, you can become a really solid data scientist in industry.

Bayesian Data Analysis

Considered to be the core book of modern Bayesian methods, by the authors of the Stan computing package. I haven’t finished it yet, but am thoroughly enjoying it. Most of my formal statistics training was in the frequentist/Fisherian fashion, and I knew very little beyond the basic definition of Bayes’ rule. Here are some things I’ve enjoyed so far:

  • I find that I’m learning a lot by contrast to frequentist methods
  • I’m learning a different way of looking at distributions and uncertainty
  • I’m reframing a lot of what I think could be delivered in a business setting

A First Course in Bayesian Statistical Methods

This book is a great introduction (and further) to Bayesian methods. I think it’s a better intro to BDA by Gelman et al. because it’s more approachable but covers a lot of the same material fundamentally. It’s also much nicer to have all the equations and all the code (although it is R). Makes things much easier to understand.

Advances in Financial Machine Learning

This book has a lot of nice advanced topics that I didn’t see mentioned anywhere else. Things like how to store your data, data transformations to get a present discounted value of your assets across wide asset classes, how to use information theory to derive useful trading signals, how to avoid data mining… It’s presented so plainly and obviously that I felt almost dumb for not thinking of the content of the books by myself before. The code is written very plainly using mostly numpy and statsmodels, but in my (partial) experience works.

Machine Learning for Asset Managers

This book is a collection of papers, by the same author as Advances in Financial Machine Learning. I found this book approachable and helpful in expanding my repertoire of techniques, but in practice I was unsuccessful in applying a lot of the methods described there. Maybe this decries my lack of understanding, my lack of skill, or that I was in the wrong place to employ these techniques. Regardless, still a good read for being thought-provoking.

Testing and Tuning Market Trading Systems

At the time of writing, this book has a rating of 3.8 on Amazon, which is ludicrously low. As far as I know, this book has the best coverage of the dangers of in-sample bias. It has many examples that demonstrate what in-sample bias does to your forecasts, details methods (mostly advanced ways to train-test split your timeseries data) to avoid this bias, and discusses their trade-offs (especially in a financial context where the noise-to-signal ratio is poor).

In industry, especially in a timeseries context, I’ve seen many people unsure of how to approach these problems in practice. As a result, most of what I’ve seen has been akin to putting a wet finger in the air to decide how to measure a model’s forecasting efficacy.

If you want to know whether you should have a train, validate, and test set; whether to retrain on your train and validate sets before running on test; or what to do with your testing errors, then this is the book for you.

Note that the code is written in C++, but it is thoroughly explained and I could follow along even with my low understanding of C++. Moreover all the code is available on a public github repository.

The Simple And Infinite Joy Of Mathematical Statistics

This is a traditional statistics textbook, aimed at higher level undergraduates or graduates. I felt like my mathematics was getting rusty, and this book covered topics I hadn’t covered, like building hypothesis tests from scratch regardless of the distribution. Moreover it is full to the brim with deeply detailed examples. I would have loved this textbook in my own stats classes.

I found this book through the recommendation of the youtube channel xvzf, and the author has all the lectures associated with the book (and more) uploaded here as well. I didn’t do most of the exercises, instead opting to read this to stretch my brain. I might get back to it one day since I thoroughly enjoyed it, but for now I’m focused on more applied problems.

Automate the Boring Stuff with Python, Learn Python 3 The Hard Way, Python Crash Course

I’m grouping these three together because they’re the three books that I used to first learn how to code. It took about 3 years of continuously finding the time after school or work, of doing all the exercises, of googling everything I didn’t understand… These three books were less the foundation of my computer science knowledge, and more the jumpstart that gave me enough knowledge to start doing my own things. It was the first time I felt like I was actually using my computer. I don’t know if this combination would work for anyone else, but it worked for me.

Python for Data Science and Machine Learning

This is the first online course I took in 2018. At the time of writing this, it was last updated in 2020. I don’t know how well it held up, but when I took this course I was looking for practical exercises. Following the lesson from Learn Python 3 the Hard Way, I wanted to just code along until the logic of writing programs entered my brain. I can’t stress this enough for anyone else like me who is learning from online resources: you have to code along and do the exercises. Choose a course where there will be lots of coding, lots of examples, and lots of practical exercises, and do as much of it as you can.

Coursera: C for everyone

After spending several years learning python, and having started to build large projects with it, I wanted to learn more about computer science. On the advice of several friends, I decided to learn a lower-level language.

This turned out to be great advice: working at a lower level helped me understand so many things about not just programming, but for the first time computer science. You can get quite far writing programs even if you don’t actually know what you’re doing. Python is great that way. But I’ve found that my coding skills got exponentially better once I started really understanding what I was doing (unsurprisingly).

Programming: Principles and Practice using C++

This book is by Bjarne Stroustrup, the creator of the C++ language. After learning the basics of C, I figured I would have a look at C++. I didn’t know what to expect how this book, besides that it is controversial (mostly for pushing forward bad practices like importing the entire standard library and not being “real industrial C++”, whatever that means), but I was nicely surprised. One of the first big projects is to build a calculator that is then progressively udpated. This is nice, because it’s basically akin to writing a simple compiler, a project I’ve been wanting to tackle to test my metle. Unfortunately I was working with the 2nd edition, and a lot of the provided code that is used for the graphics part of the book wouldn’t compile, so I had to give up halfway through. Since the third edition came out, I’ve been meaning to give it another shot.

Pikuma: C++ Game Engine Programming

I was thoroughly impressed with how good this course was. If Udemy is junk food, Pikuma is a balanced, healthy, invigorating, 5 course meal. I learned about so much: CMake, linking and compiling, templates and generics, memory allocation and freeing memory, designing a game engine, composition VS inheritance, and many more. Moreover it perfectly fit the types of tutorials I love: from scratch, starting with an empty editor, and coding along in real time. Usually I can follow along leaving tutorials on double speed, but not with Pikuma, which tells you how dense the information was. It took several months of intense work to finish this course, and I’m very much looking forward to having the time to do other courses such as the 3D graphics one or the 2D physics one.

Leetcode

Leetcode-style exercises as a recruitment tool are quite awful, but as a resource to learn basic algorithms hands-on it is excellent. Your exercises are ordered by difficulty, you can do them in basically any language, they are quickly graded, and there is a wealth of solutions to look at if you’re realyl stuck. Having to crunch Leetcode hours a day to find a job is really awful, but as an exercise book without your salary depending on it, it does its job really well.

Mastering ‘Metrics

This was the main applied textbook used for the Econometrics class that I TA’ed for when I was doing my masters degree. I think it’s fairly for the undergraduate level, although not sure well it’s aged. A more advanced book, also by Angrist, Mostly Harmless Econometrics, is also quite reputable, although I’ve only skimmed it.