SailingDataLakes

SailingDataLakeshttps://sailingdatalakes.com/Recent content on SailingDataLakesHugo -- gohugo.ioenSun, 05 Jul 2026 00:00:00 +0000Forward Baseball Ballistics: Mapping the Power Matrixhttps://sailingdatalakes.com/projects/forward-baseball-ballistics/Sun, 05 Jul 2026 00:00:00 +0000https://sailingdatalakes.com/projects/forward-baseball-ballistics/Forward Baseball Ballistics: Mapping the Power Matrix Link to heading Introduction Link to heading In the last post, I worked a real home run backwards: given a rough hang time, a rough distance, the wind, and the pitch I saw, the model solved for the exit velocity, launch angle, and swing speed that would have produced it. That’s an inverse problem - one observed flight, one recovered answer. But once you have a swing speed number for yourself, a much more interesting question opens up: not “what did I do,” but “what could I do?Gradient Descenthttps://sailingdatalakes.com/posts/gradient-descent/Sat, 04 Jul 2026 00:00:00 +0000https://sailingdatalakes.com/posts/gradient-descent/Purpose Link to heading In linear regression, we were able to solve for the optimal parameters directly, in closed form, using ordinary least squares. Most models we care about don’t afford us that luxury. Gradient descent is the general purpose optimization algorithm that lets us fit a model’s parameters iteratively, whenever we can’t (or don’t want to) solve for them directly. In this article, we’ll cover what gradient descent is, the math and algorithm behind it, and an example implementation, building on the linear regression problem to check our work against a known answer.Inverse Baseball Ballistics Calculationshttps://sailingdatalakes.com/projects/home-run-ballistics/Sat, 04 Jul 2026 00:00:00 +0000https://sailingdatalakes.com/projects/home-run-ballistics/Inverse Baseball Ballistics Calculations Link to heading Introduction Link to heading A radar/camera rig like TrackMan or Rapsodo will hand a hitter exact numbers for exit velocity (how fast the ball leaves the bat), launch angle (how steeply it comes off the bat), and swing speed (how fast the bat itself is moving at contact). Those systems cost tens of thousands of dollars and live in college programs and MLB parks, not on a random weeknight beer-league field.Root-Finding: Newton, Halley, and Secant Methodshttps://sailingdatalakes.com/posts/root-finding/Sat, 04 Jul 2026 00:00:00 +0000https://sailingdatalakes.com/posts/root-finding/Purpose Link to heading We’ve now solved for model parameters directly, with linear regression’s closed-form OLS solution, and iteratively, with gradient descent. Both problems boiled down to finding where a derivative equals zero. The more general version of that problem — finding where some function $f$ itself equals zero, with no derivative in sight — shows up everywhere: pricing a bond to match a target yield, finding the break-even point of a nonlinear cost curve, or solving an equilibrium condition that has no closed form.Auto Regressionhttps://sailingdatalakes.com/posts/auto-regression/Fri, 03 Jul 2026 00:00:00 +0000https://sailingdatalakes.com/posts/auto-regression/Purpose Link to heading In this article we’re covering Auto Regression (AR) - one of the foundational models used for time series forecasting. We’ll build up an intuition for what makes it different from the regression models we’ve covered previously, walk through the underlying math, and then implement it from scratch, keeping the code as close to the math as possible. We’ll wrap up by fitting our implementation to a real (and famous) dataset: the Wolf sunspot numbers.Logistic Regressionhttps://sailingdatalakes.com/posts/logistic-regression/Fri, 17 May 2024 00:00:00 +0000https://sailingdatalakes.com/posts/logistic-regression/Purpose Link to heading The goal of this post is to better understand logistic regression. As usual, we’ll walk through the intuition, the math, some code, metrics, and finally an example. What is Logistic Regression Link to heading Like linear regression, logistic regression is also a linear model, and also like linear regression, it is used for making real value predictions. So where does the logistic piece come in, and how is it a classifier?Regularized Linear Regressionhttps://sailingdatalakes.com/posts/regularization/Fri, 10 May 2024 00:00:00 +0000https://sailingdatalakes.com/posts/regularization/Purpose Link to heading The goal of this article is first to develop an understanding of overfitting and regularization. After which, we will discuss the intuition, math, and code of the two primary methods of regularizing linear regression; ridge regression, and LASSO regression. Overfitting Link to heading Overfitting usually occurs when a model is overly complex for a given problem or given dataset, and thus able to memorize the training set.K-Means Clustering & Variantshttps://sailingdatalakes.com/posts/k-means/Mon, 22 Apr 2024 00:00:00 +0000https://sailingdatalakes.com/posts/k-means/Purpose Link to heading Today we’re reviewing K-Means clustering and the many variants. Clustering is a form of unsupervised learning, that is quite intuitive and useful (in certain circumstances). What is Cluster Analysis? Link to heading First, cluster analysis is a set of methodologies that are utilized to group data by similarity. This can be utilized as a standalone unsupervised model, as an analytical tool, or as form of feature reduction.Sailing Route Optimization with Q-Learninghttps://sailingdatalakes.com/projects/sailing-route-optimization/Sat, 13 Apr 2024 00:00:00 +0000https://sailingdatalakes.com/projects/sailing-route-optimization/Sailing Route Optimization with Q-Learning Link to heading Introduction Link to heading Sailing can be hard and not always intuitive, especially when racing. Typically a sailboat race has a starting line/finishing line and two marks (waypoints). A windward mark, and a leeward mark. Before we move on, let’s go over some terminology. Windward-Leeward Course: A typical sailboat race course configuration. You have a start-finish line in the center of the course, that is perpendicular to the direction of the wind.Linear Regressionhttps://sailingdatalakes.com/posts/linear-regression/Tue, 12 Mar 2024 00:00:00 +0000https://sailingdatalakes.com/posts/linear-regression/Purpose Link to heading In this article, we will cover what linear regression is, what the underlying mathematics looks like, common metrics to evaluate the model, along with an example of how to use it. What is Linear Regression Link to heading Linear regression, as the name implies, is a linear model used for making real value predictions. It is a comparatively simple model, that is mathematically sound, easy to explain, and easy to understand.Abouthttps://sailingdatalakes.com/about/Wed, 12 Mar 2014 00:00:00 +0000https://sailingdatalakes.com/about/John Hale Link to heading Hello, I’m John! I’m a data scientist passionate about learning new things. Specifically, I’m interested in math, machine learning, and algorithms. I mostly enjoy spending time with my family. We’re big on hobbies, such as sailing, kayaking, cooking, fishing, swimming, and shark tooth hunting! When I can, I’ll try and include these activities in my projects. I decided to start this blog as an accountability tool as I work to better my understanding of various machine learning related algorithms.