Hi, I'm Kyle

Software developer with an applied math background. I work across data engineering, machine learning, and infrastructure — from production ETL pipelines and Azure ML services to a self‑hosted K3s home lab I operate for 50+ users at 99% uptime.

Portrait of Kyle Booker

About me

I'm a Vancouver‑based engineer with a background in applied mathematics (M.Math, University of Waterloo). My career has lived at the intersection of data engineering, analytics, software development, and cloud systems — building reliable tools that turn complex data into useful decisions.

In industry I've shipped production ETL pipelines moving 1 GB/hour, Azure OCR services replacing manual data entry for 70% of an analyst team's workload, supervised forecasting models with REST inference endpoints, and a churn‑prediction service that drove a 15% lift in retention. I'm currently reviewing and debugging contributor‑submitted code (Python, C, C++, JS) at Outlier AI to improve LLM training data.

Outside of work I operate a Linux home lab running 100+ Docker containers on K3s, with production‑style ingress (Traefik, Cloudflare Tunnels, Authelia SSO, WireGuard) serving 20+ active users at 99% uptime. It's where I sharpen the operational side: deployments, monitoring, backups, and incident recovery.

Skills & stack

Programming

  • Python
  • SQL
  • C++
  • JavaScript
  • TypeScript
  • C
  • C#
  • Java
  • Bash
  • HTML / CSS

Infrastructure & DevOps

  • Docker
  • K3s (Kubernetes)
  • Linux
  • Traefik
  • Cloudflare Tunnels
  • WireGuard
  • Git / CI · CD
  • REST APIs

Data & cloud

  • PostgreSQL
  • pandas
  • Azure
  • MongoDB
  • Redis
  • Apache Spark
  • AWS
  • Power BI

Scientific computing

  • NumPy / SciPy
  • MFEM (FEM)
  • Numerical PDEs
  • Discontinuous Galerkin
  • HPC / Compute Canada
  • ParaView

Experience

Stylized code editor showing reviewed lines with green checkmarks

Reviewer, AI Training (Programming & Mathematics)

Outlier AI • Jan 2024 – Present • Vancouver, BC

Run, debug, and fix contributor‑submitted Python, C, C++, and JavaScript code across math, programming, and physics tasks; verify correctness, edge‑case handling, and runtime behaviour before submissions enter the training pipeline. Audited 5,000+ submissions and authored structured feedback that improved contributor consistency and downstream model accuracy.

Python C / C++ JavaScript Code review LLM training data
Analytics dashboard with KPI cards and a forecast trend chart

Data Scientist

Intellifi (Residential Technology) • Jun 2022 – Mar 2023 • Vancouver, BC

Shipped an Azure Computer Vision OCR ingestion service in Python processing 1,000+ monthly PDFs end‑to‑end, replacing manual data entry and cutting analyst workload by 70%. Built and deployed supervised forecasting models to Azure with REST inference endpoints integrated into production systems. Owned the reporting layer over the data pipeline, exposing KPIs to product, sales, and executive stakeholders via Power BI.

Python Azure OCR REST APIs Power BI Forecasting
ETL pipeline diagram: sources flow into a transform stage and load into an analytics store

Data Analyst

CMLS Financial (Residential Capital Markets) • Jun 2021 – Jun 2022 • Vancouver, BC

Built and operated an automated Python/SQL ETL pipeline processing 1 GB of customer data per hour, replacing manual workflows. Engineered a Selenium ingestion service scraping 30,000 real‑estate listings per week into a PostgreSQL store for analytics. Delivered a production churn‑prediction service (Python, pandas) at 80% recall, driving a 15% lift in retention via targeted campaigns.

Python SQL PostgreSQL ETL Selenium pandas
CFD visualization

Graduate Research Assistant

University of Waterloo — Scientific Computing & CFD Lab • Sep 2018 – Apr 2021

Implemented H(div)‑conforming discontinuous Galerkin methods for multiphase flow in C++ on the MFEM finite‑element library — 2,000+ lines including custom discretizations, solvers, and numerical experiments. Ran large‑scale simulations on Compute Canada HPC clusters with parallel jobs spanning many cores and multi‑day runtimes. Built Python and ParaView pipelines for post‑processing, convergence analysis, and visualization. Authored an NSERC grant proposal, securing $27,000 in thesis research funding.

C++ MFEM Python HPC / Compute Canada ParaView Finite element methods
Graph theory diagram

Undergraduate Research Assistant

Thompson Rivers University • May 2016 – Aug 2018 • Kamloops, BC

Refactored 500 lines of legacy C code and parallelized batch simulations with Bash and Python, doubling data throughput. Implemented Python and C algorithms for applied math research in graph theory, matroid theory, and numerical analysis.

Python C Bash Graph theory Numerical analysis

Home lab

I operate a production‑style self‑hosted infrastructure that runs my personal services and supports family and friends. It's the place I sharpen the operational craft — deployments, ingress, identity, monitoring, backups, and incident recovery — at small but real scale.

100+
Docker containers
20+
Active users
99%
Uptime
K3s
Orchestration
Live home‑lab metrics streaming from Grafana · updates in real time Open in full ↗

Edge & networking

Public ingress, DNS, and remote access

Traefik
Reverse proxy with automatic Let's Encrypt TLS; container‑aware routing chosen over Nginx for dynamic Docker discovery.
Cloudflare Tunnel + WireGuard
Tunneled public ingress (no exposed ports) and WireGuard for admin access. No static IP needed.
Pi‑hole (DoH) + AdGuard Home
Network‑wide DNS sinkhole with DNS‑over‑HTTPS upstream; AdGuard as a secondary resolver.
UniFi Controller
Network management, VLAN segmentation, and access‑point orchestration.

Identity & security

Single sign‑on, secrets, and intrusion detection

Authelia
SSO + TOTP/Webauthn 2FA in front of every internal service. Replaced ad‑hoc basic auth.
CrowdSec
Behavioral WAF and crowd‑sourced IP reputation; auto‑bans bad actors at the Traefik layer.
Vaultwarden
Self‑hosted Bitwarden‑compatible password manager; family vault shared via groups.
PrivateBin
End‑to‑end encrypted pastebin for sharing secrets out‑of‑band.

Observability

Metrics, logs, uptime, and alerting

Prometheus + Grafana
Same stack used in most production shops; dashboards for host, containers, network, and per‑service metrics.
InfluxDB + Telegraf
Time‑series collection for long‑retention infrastructure metrics.
Uptime Kuma → Gotify
Active probing of every public endpoint; push notifications on incident.
Dozzle + GoAccess
Live container log streaming and web‑log analysis without a heavy logging pipeline.

Productivity & data

Personal cloud, documents, and finance

Nextcloud
Self‑hosted file sync, calendars, and contacts; replaces Google Drive for the household.
Immich (+ Postgres)
Photo backup and ML‑powered library; replaces Google Photos for two iPhones.
Paperless‑ngx + Paperless‑AI
OCR‑indexed document archive — a direct extension of the OCR ingestion work I did at Intellifi.
Firefly III · Joplin · Wallos
Personal finance, encrypted notes sync, and subscription tracking.

Communications

Self‑hosted federated messaging

Matrix / Synapse
Federated chat homeserver backed by dedicated PostgreSQL; full ownership of message history.
Element Web
Branded web client for the homeserver, behind Authelia.
Maubot
Plugin‑based Matrix bot framework for automations and notifications.
coturn
TURN/STUN server for WebRTC voice and video NAT traversal.

Developer tooling & data stores

Where code, search, and persistence live

Gitea
Self‑hosted Git with automated nightly backup and tested restore runbook.
code‑server
Browser‑accessible VS Code for editing infrastructure from anywhere.
Meilisearch · Overleaf
Fast typo‑tolerant search and self‑hosted LaTeX editor.
Postgres ×4 · MariaDB · MongoDB · Redis
Multiple Postgres versions side‑by‑side (13–17) for service‑specific compatibility, with shared MariaDB / Mongo / Redis pools.

How it's run

Backups
Duplicacy snapshots run nightly with encryption and off‑site rotation; restore drills tested quarterly.
Updates
Container images pinned and updated on a controlled cadence rather than blind auto‑pull; rollback tested.
Monitoring & alerts
Prometheus + Grafana for metrics, Uptime Kuma for end‑to‑end availability, Gotify push for paging.
Access
Zero direct port exposure: everything reaches the public via Cloudflare Tunnel, gated by Traefik + Authelia + CrowdSec.
Orchestration
K3s for service workloads, plain Docker for ancillary tools, Portainer for visual inspection.
Hardware
Self‑built workstation running Unraid with 52 TB of parity‑protected storage. Hosts the container workloads alongside a Home Assistant VM and a general‑purpose Linux VM.

Education

Master of Mathematics, Applied Mathematics — University of Waterloo

  • Research: Computational Fluid Dynamics, Scientific Computing, Numerical Analysis
  • Courses: Fluid Mechanics; Numerical PDEs; Transport Phenomena & Multiphase Flow; Applied Functional Analysis
  • Thesis: H(div)-conforming Discontinuous Galerkin Methods for Multiphase Flow

B.Sc., Computer Science & Mathematics — Thompson Rivers University

  • GPA: 4.21 / 4.33 · Dean's Honour Roll (2016, 2017, 2018)
  • CS: Advanced Web Design & Programming; Networks; Data Structures; DB Systems; Operating Systems
  • Math: Graph Theory; Probability; Statistics; Differential Equations; Topology; Linear Programming; Linear Algebra; Discrete Math

Contact info