Beautiful code is elegant and simple; however, to achieve this goal programmers need to achieve the fine balance between readability and conciseness. Generally, I have hated the verbosity of languages like C#, especially when compared to the simplicity of octave or python. I have been dabbling with the use of F# in data analysis, simple dsl for rules engines and data processing utilities. My experience thus far has been a very positive one and have been surprised by how compact and readable it is, even if I havent looked at it for several months. I illustrate this opinion with a simple implementation of gradient descent used to optimise the cost function in regression learning. Let's start with a basic primer on linear regression:
Linear regression involves formulating an hypothesis about the relationship between variables of interest e.g. years of educations vs potential earnings. The multiple variable hypothesis function can be written as:
h(θ)=θ0+θ1⋅x1+θ2⋅x2+..+θn⋅xn
where θ0 is the zero bias, θ1..θn are the weights, x1..xn are the input variables
The weights are estimated by minimising Sum of Squared Error function given by:
C(θ0,θ1,...θn)=12m∑i=1m(hθ(xi)−yi)2
where m is the number of observations and y is the output vector for the training set.
The cost function can be minimised using the gradient(hill) descent method which involves a simultaneous update of θj = θj - learning rate (α) times the partial derivative of C(θ) with respect to θj for j=0…n
The above equation simplifies to:
θ0=1m∑i=1m(hθ(xi)−yi)
for j=0
θj=1m∑i=1m(hθ(xi)−yi)xi
for j=1..n
Now lets look at the f# version of the hypothesis and gradient descent function. We are going to use a vectorised version of the above code to avoid have to deal with loops and subscripts. The code below uses the Mathnet.Numerics package for matrix and vector operations.
let h (X:Matrix<float>) (θ: Vector<float>) = X*θ
let gradientdescent (X: Matrix<float>) (y:Vector<float>) (θ: Vector<float>) (α: float) =
θ - (α / float y.Count)*(((h X θ) - y)*X )
Lets test it out with an example:
//Slice the training data set A into X input features and y output features and append a unit vector to account for the zero bias
let A = matrix [[ 1.;0.5] [ 2.;1.] [ 3.;1.5]]
let X = DenseMatrix.Create(A.RowCount, 1,1.0).Append(A.[0..,0..0])
let y = A.[0..,1]
let mutable θ = vector [0. ;0.]
let α = 0.1
let mutable θ = vector [0. ;0.]
let α = 0.1
// descend for 1500 iterations
for i in 1..1500 do
θ < - gradientdescent X y θ α
θ < - gradientdescent X y θ α
Also, the gradient descent function works for logistic regression by simply using a sigmoid hypothesis function.
let sigmoid (x:Matrix<float>) (θ: Vector<float>) = 1.0/(1.0 + (X*θ).Negate().PointwiseExp())
let h (X:Matrix<float>) (θ: Vector<float>) = sigmoid X θ
let h (X:Matrix<float>) (θ: Vector<float>) = sigmoid X θ
Except some type pollution and ignoring the hardcoded test code, the code is pretty self-self-explanotory and demonstrates a good balance between readability and conciseness.The above code is a rather naïve implementation of linear regression. You can extend this by introducing functions to load data sets, slice into input and output features, feature scaling, alpha cooling and ending the descent on convergence to some epsilon.