Gaussian Mixture Models & Expectation Maximization

Recently, I’ve had some spare time to go back through some material from the end of college that I didn’t think I’d quite mastered (thanks covid!). One of these topics is the Gaussian Mixture model and how we can use expectation maximization to implement it.

Read More

Active Learning via Ensembles

This post is a paraphrased version of a report I submitted for a machine learning course. I found the topic to be quite interesting, so I’m reposting it here.

The field of active learning has many different approaches. This section focuses on the Query-by-Committee (QbC) framework, which uses ensembling methods to find the best sample to query the oracle for a label. There are generally two parts to this approach. The first part is to construct and train a model ensemble. Two methods are implemented in this work: bagging and boosting. Bagging has the advantage of simplicity, but boosting often gives a larger performance increase. The second part is finding the most optimal example to query the oracle. This is done by finding the maximum “disagreement” of the classifiers, which is done via a variety of methods, including entropy and KL divergence. Overall, the QbC method allows comparable or greater accuracy to a classifier trained on the whole dataset, but with a vastly reduced number of required samples. This work proposes a new QbC framework called jackAL based on the jackknife; this method offers an advantage over the others because it allows the model to maximize small quantities of data, which is often the case when active learning is required. A variation on the jackknife, jackknife-k is explored as well.

Read More

Finite State Automata

I’ve come across finite state automata (also known as finite state machines) in multiple different contexts. After all, regular languages are context-free (this joke will not be funny until later). One of the more interesting aspects of computer science is how different topics pop up across different areas that appear to be totally unconnected.

Read More

Pretty Printing of .csv's in the terminal

I spend much of my time ssh’ed into a remote machine (my school’s high performance computing cluster) and often come across .csv files that I’d like to view. cat, although fast, does not handle .csv’s in any special way, and if the .csv is not short and simple, can result in unintelligible output. I was recently diving through my organization’s .bashrc and found this handy script.

function pcsv() {

Read More