Monday, 4 April 2016

F#: Readability vs Conciseness

Beautiful code is elegant and simple; however, to achieve this goal programmers need to achieve the fine balance between readability and conciseness. Generally, I have hated the verbosity of languages like C#, especially when compared to the simplicity of octave or python. I have been dabbling with the use of F# in data analysis, simple dsl for rules engines and data processing utilities. My experience thus far has been a very positive one and have been surprised by how compact  and readable it is, even if I havent looked at it for several months. I illustrate this opinion with a simple implementation of gradient descent used to optimise the cost function in regression learning. Let's start with a basic primer on linear regression:

Linear regression involves formulating an hypothesis about the relationship between variables of interest e.g. years of educations vs potential earnings. The multiple variable hypothesis function can be written as:

h(θ)=θ0+θ1x1+θ2x2+..+θnxn
where θ0 is the zero bias, θ1..θn are the weights, x1..xn are the input variables

The weights are estimated by minimising Sum of Squared Error function given by:
C(θ0,θ1,...θn)=12mi=1m(hθ(xi)yi)2
where m is the number of observations and y is the output vector for the training set.

The cost function can be minimised using the gradient(hill) descent method which involves a simultaneous update of  θj = θj - learning rate (α) times the partial derivative of C(θ) with respect to θj for j=0…n

The above equation simplifies to: 
θ0=1mi=1m(hθ(xi)yi)
for j=0
θj=1mi=1m(hθ(xi)yi)xi
for j=1..n 

Now lets look at the f# version of the hypothesis and gradient descent  function. We are going to use a vectorised version of the above code to avoid have to deal with loops and subscripts. The code below uses the  Mathnet.Numerics package for matrix and vector operations.

let h (X:Matrix<float>) (θ: Vector<float>) = X*θ 
let gradientdescent (X: Matrix<float>) (y:Vector<float>) (θ: Vector<float>) (α: float)  =
    θ - (α / float y.Count)*(((h X θ) - y)*X )

Lets test it out with an example:

//Slice the training data set A into X input features and y output features and append a unit vector to account for the zero bias 
let A = matrix [[ 1.;0.5]  [ 2.;1.]  [ 3.;1.5]]
let X = DenseMatrix.Create(A.RowCount, 1,1.0).Append(A.[0..,0..0])  
let y = A.[0..,1]
let mutable θ = vector [0. ;0.]
let α = 0.1

// descend for 1500 iterations
for i in 1..1500 do
   θ < - gradientdescent X y θ α  

Also, the gradient descent function works for logistic regression by simply using a sigmoid hypothesis function.

let sigmoid (x:Matrix<float>) (θ: Vector<float>) = 1.0/(1.0 + (X*θ).Negate().PointwiseExp())
let h (X:Matrix<float>) (θ: Vector<float>) = sigmoid X θ

Except some type pollution and ignoring the hardcoded test code, the code is pretty self-self-explanotory and demonstrates a good balance between readability and conciseness.The above code is a rather naïve implementation of linear regression. You can extend this by introducing functions to load data sets, slice into input and output features, feature scaling, alpha cooling and ending the descent on convergence to some epsilon.