Aaron M. Hosford

Machine Learning Researcher and Engineer

Website | GitHub | HTML Resume | PDF Resume | LinkedIn | Patent Info


I've been programming for more than 30 years, with over a decade and a half of work experience in Python. I also have experience with C, C++ , C#, Java, and many other languages. I've been a machine learning and AI enthusiast since I first taught myself to program at the age of 12. In the workplace, I have helped to build a major NLP application serving tens of millions of customers. I've also architected and implemented multiple predictive analytics pipelines, one of which was a demand forecasting system projected to save the company millions of dollars. At home, I spend my time working on open source projects, experimental machine learning algorithms, or my NLU system, NPC.

Select Open-Source Projects


XCS (Accuracy-based Classifier System) is an algorithm invented by Stewart W. Wilson, Ph.D. XCS is a type of Learning Classifier System (LCS), a machine learning algorithm that utilizes a genetic algorithm acting on a rule-based system, to solve a reinforcement learning problem. You can check out my open-sourced implementation of the algorithm here.

Pyramids Semantic Parser

The Pyramids Semantic Parser is a pure-Python parser I built for the purpose of extracting semantic information from natural language. I use it in my NLU system, NPC. It is a work in progress, but can already handle a broad cross section of natural language inputs.


Attila is a framework for automating processes with Python in business computing environments. I implemented it during my time with Ericsson and it was used extensively to automate ETL, reporting, and web scraping scripts, as well as an automated machine learning pipeline for generating product demand forecasts.

Less Naive Bayes

Less Naive Bayes is a machine learning classification algorithm related to Naive Bayes, but capable of handling input spaces that are not linearly separable. It works by successively training new sub-classifiers to predict both the target classification and the mis-classifications of earlier classifiers. The feature space is successively subdivided into linearly separable subspaces which the sub-classifiers are better capable of distinguishing. Eventually the feature space is transformed sufficiently that the most recent layer is fully capable of correctly classifying all samples with distinguishable features.