#loss_function #cost_function

Approximation

J(θ) = MSE = E [ (YY^)2 ]

with MSE = the Mean Squared Error (MSE)
               Y = the ground truth output values for the training examples
               Y^ = the predicted ouput values for the training examples
           E[z] = the mean estimator: X¯=1mi=1mXi

Expanded

On contrary to the Mean Squared Error (MSE), the expression of the RMSE doesn't need to be divided by 2, because the square root already eases the descent.

J(θ) = 1mi=1m(yi  y^i)2 = 1mi=1m(yi  hθ(xi))2

with m = the number of training examplesxi = the input (feature) of the ith training exampleyi = the ground truth output of the ith training examplehθ(x) or y^i = the predicted ouput of the ith training example

Vectorized

J(θ) = 1m( Xθ  y )T ( Xθ  y )

with X = a matrix of the training examples arranged as rows of Xy = a vector of all the expected output values