Output from the chuckcode generative model
vtreat - prepping your data
“80% of data analysis is spent on the process of cleaning and preparing the data”
– Dasu and Johnson 2003
In the glamorous world of big data and deep learning the get your hands into the dirt of real world data is often left as an exercise to the analyst with hardly a footnote in the resulting paper or software.
With the advent of open source software there are more and more people having to deal with the pain of real data.
read more
Personalized PageRank
PageRank is a metric developed by Google founders Larry Page and Sergey Brin. PageRank is defined by the on a directed graph $G=(V,E)$ as the probability that a random walk will be on any particular node $v_i$ when walking over the edges randomly. As there may be vertices that only point to each other (spider traps) they introduced the idea of a random restart, so with probability $\beta$ the random walk with jump randomly to a new vertices which is also known as the dampening factor.
read more
Multi-classification
So you want to predict more than binary classes? Well life is hard and it isn’t as easy. Some approaches
Softmax For binary classification we would normally take our $z=w^tx $ and calculate the logit $ \frac{1}{1+e^{-z}} $ .
When there are multiple classes there are a few different approaches. First is the softmax
$$ \sigma(z)_j = \frac{e^{z_j}}{\sigma_{k=1}^K e^{z_k}} $$ If you have a large number of classes it is computationally expensive to calculate $\sigma(z)_j$ for each class so you can use a hierarchical softmax.
read more
Factorization Machines
Steven Rendle’s Factorization Machine’s paper is a cool tour de force of showing off a new machine learning approach. He proposes a new model of linear model that can handle very sparse data and automatically model interactions between different variables. Not only is it interesting theoretically but variants have been winners in Kaggle ad CTR prediction contests for Criteo and Avazu and further tried out in real advertising systems at scale and performed well
read more
Facebook Practical Lessons
Xiran He et al. - “Practical Lessons from Predicting Clicks on Ads at Facebook” paper
Interesting to note that of the 11 authors 5 had already left Facebook at the time of writing.
use Normalized Cross Entropy (NE) and calibration as major evaluation metrics
Normalized entropy is the predictive log loss normalized by entropy of background CTR. E.g. what are you gaining over just empirical CTR of training data set.
read more
Wide and Deep Models
Google play recommendations use a wide and deep modeling approach according to Heng-Tze Cheng, Levent Koc et al.. The basic idea is that memorization of co-occurence for app installs is very powerful when there is a clear signal but that it is hard for new applications and doesn’t generalize well. Use a deep learning model of more primitive features (e.g. words, category, etc.) to get better generalization and combine with memorization to get something like the best results from each.
read more
Beta Prior fora Binomial
“We’re all bayesians in the foxhole”
How can we use prior data to help with low counts? It happens all the time. We’re trying to estimate the rate of something (e.g. clickthrough rate) but we’ve got low counts for some of them. When somebody sorts by clickthrough rate it looks like the top results are trash or don’t make sense. Often the reason is the low counts make for high variance.
read more
Back to basics - logistic regression
“All models are wrong, but some are useful”
– George Box
Logistic regression is used to predict binary variables and is characterized by the logistic function
$$ \begin{align*} logistic(z) &= \frac{e^z}{e^z + 1} \\ \sigma (z) &= \frac{e^z}{e^z + 1} \\ &= \frac{1}{1 + e^{-z}} \\ \end{align*} $$ Cost Function: With $\mathbf{X}$ being the observed variables, $\boldsymbol{\beta}$, $\beta_0$ being the fitted parameters, $ z = \beta_0 + \boldsymbol{\beta}^t\mathbf{X} $ and $logistic(z)$ being the predicted value also known as $\hat{y}$ and $y$ being the true label.
read more