Machine Learning

Machine Learning By Bharat Sreeram

Bharat Sreeram
bsreeram.datascience@gmail.com
sankara.deva2016@gmail.com
6309613028

--------------------------------------------------------------------

Implementation of Linear Regression

Using R Language.

Data file: profiles.txt

Schema : name,age,exp,qual,income

"name","age","exp","qual","income"

aaa,21,0,btech,20000

bbbb,22,1,btech,25000

ccc,21,0,mtech,25000

dddd,22,1,mtech,30000

ee,25,4,btech,40000

ffff,25,3.5,mtech,47000

gggg,25,4,mtech,50000

hhh,30,8,btech,80000

jjjj,31,8,mtech,91000

eejj,31,9,mtech,95000

task:

construct relationship between age,exp,qual and income.

Y = X.beta

Beta = inv[t(X).X].[t(X).Y]

Using custom function for Beta.(coefficient matrix).

> df = read.csv('D:/mystuff/profiles.txt')

> df

name age exp qual income

1 aaa 21 0.0 btech 20000

2 bbbb 22 1.0 btech 25000

3 ccc 21 0.0 mtech 25000

4 dddd 22 1.0 mtech 30000

5 ee 25 4.0 btech 40000

6 ffff 25 3.5 mtech 47000

7 gggg 25 4.0 mtech 50000

8 hhh 30 8.0 btech 80000

9 jjjj 31 8.0 mtech 91000

10 eejj 31 9.0 mtech 95000

> df$age

[1] 21 22 21 22 25 25 25 30 31 31

> df$q = 5

> df

name age exp qual income q

1 aaa 21 0.0 btech 20000 5

2 bbbb 22 1.0 btech 25000 5

3 ccc 21 0.0 mtech 25000 5

4 dddd 22 1.0 mtech 30000 5

5 ee 25 4.0 btech 40000 5

6 ffff 25 3.5 mtech 47000 5

7 gggg 25 4.0 mtech 50000 5

8 hhh 30 8.0 btech 80000 5

9 jjjj 31 8.0 mtech 91000 5

10 eejj 31 9.0 mtech 95000 5

> df$q[df$qual=='mtech']=8

> df

name age exp qual income q

1 aaa 21 0.0 btech 20000 5

2 bbbb 22 1.0 btech 25000 5

3 ccc 21 0.0 mtech 25000 8

4 dddd 22 1.0 mtech 30000 8

5 ee 25 4.0 btech 40000 5

6 ffff 25 3.5 mtech 47000 8

7 gggg 25 4.0 mtech 50000 8

8 hhh 30 8.0 btech 80000 5

9 jjjj 31 8.0 mtech 91000 8

10 eejj 31 9.0 mtech 95000 8

> X = data.frame(one=1, a=df$age, e=df$exp, q=df$q)

> X

one a e q

1 1 21 0.0 5

2 1 22 1.0 5

3 1 21 0.0 8

4 1 22 1.0 8

5 1 25 4.0 5

6 1 25 3.5 8

7 1 25 4.0 8

8 1 30 8.0 5

9 1 31 8.0 8

10 1 31 9.0 8

> class(X)

[1] "data.frame"

> class(X)

[1] "data.frame"

> X = data.matrix(X)

> class(X)

[1] "matrix"

> X

one a e q

[1,] 1 21 0.0 5

[2,] 1 22 1.0 5

[3,] 1 21 0.0 8

[4,] 1 22 1.0 8

[5,] 1 25 4.0 5

[6,] 1 25 3.5 8

[7,] 1 25 4.0 8

[8,] 1 30 8.0 5

[9,] 1 31 8.0 8

[10,] 1 31 9.0 8

> dim(X)

[1] 10 4

> Y = matrix(df$income)

> Y

[,1]

[1,] 20000

[2,] 25000

[3,] 25000

[4,] 30000

[5,] 40000

[6,] 47000

[7,] 50000

[8,] 80000

[9,] 91000

[10,] 95000

> coeffs = function(x,y){

+ l = solve(t(x)%*%x)

+ r = t(x)%*%y

+ l%*%r

+ }

> beta = coeffs(X,Y)

> beta

[,1]

one -155369.530

a 7774.047

e -1078.645

q 1932.194

Deriving coefficients using predefined functions:

> lmfit = lm(income ~ age + exp + q , data=df)

> beta = cbind(coefficients(lmfit))

> beta

[,1]

(Intercept) -155369.530

age 7774.047

exp -1078.645

q 1932.194

Quadratic polynomial model(to transform non linear to linear).

a=df$age

e = df$exp

q = df$q

a2=a^2

e2 = e^2

q2 = q^2

i=df$income

beta2 = cbind(coefficients(lm(i ~a+ a2 + e+e2+q+q2)))

> # deriviving cubic polynomial coefficients.

> a3 = a^3

> e3 = e^3

> q3 = q^3

> beta3 = cbind(coefficients(lm(i ~ a + a2 + a3 +

+ e + e2 + e3 +

+ q + q2 + q3)))

> beta3

[,1]

(Intercept) 2691084.7350

a -304049.7494

a2 11227.3276

a3 -134.8921

e 15809.6330

e2 -2587.9205

e3 148.3180

q 2222.2222

q2 NA

q3 NA

In next class we will see , how to measure accuracy of each model.

Machine Learning Session 15:

Developing predict function and testing accuracy.

> predict = function(x,b){

ycap = x%*%b

ycap

}

input matrix.

> X = cbind(1, df$age, df$exp, df$q)

> dim(X)

[1] 10 4

> X

[,1] [,2] [,3] [,4]

[1,] 1 21 0.0 5

[2,] 1 22 1.0 5

[3,] 1 21 0.0 8

[4,] 1 22 1.0 8

[5,] 1 25 4.0 5

[6,] 1 25 3.5 8

[7,] 1 25 4.0 8

[8,] 1 30 8.0 5

[9,] 1 31 8.0 8

[10,] 1 31 9.0 8

> ycap = predict(X, beta)

> dim(ycap)

[1] 10 1

> ycap

[,1]

[1,] 17546.43

[2,] 24241.83

[3,] 23343.01

[4,] 30038.42

[5,] 44328.04

[6,] 50663.94

[7,] 50124.62

[8,] 78883.70

[9,] 92454.33

[10,] 91375.68

How to test accuracy of model predictions.

based on closeness expectation between actual target value

and predicted value.

ex:

a is actual age

acap is predicted age.

then distance between actual and predicted.

> a=25

> acap=26

> abs(a-acap)/acap * 100

[1] 3.846154

here distance is 3.84

Then closeness between actual and predicted.

100 - distance.

> 100 - abs(a-acap)/acap * 100

[1] 96.15385

here closeness 96%, if expected closeness 90%,

then above prediction is good.

lets apply above accuracy measurement on our predictions.

ex: y(actual incomes) and ycap(predicted incomes)

> Y = cbind(df$income)

> Y

[,1]

[1,] 20000

[2,] 25000

[3,] 25000

[4,] 30000

[5,] 40000

[6,] 47000

[7,] 50000

[8,] 80000

[9,] 91000

[10,] 95000

> ycap

[,1]

[1,] 17546.43

[2,] 24241.83

[3,] 23343.01

[4,] 30038.42

[5,] 44328.04

[6,] 50663.94

[7,] 50124.62

[8,] 78883.70

[9,] 92454.33

[10,] 91375.68

> dist = abs(Y-ycap)/ycap * 100

> dist

[,1]

[1,] 13.9832956

[2,] 3.1275150

[3,] 7.0984295

[4,] 0.1278863

[5,] 9.7636619

[6,] 7.2318577

[7,] 0.2486241

[8,] 1.4151260

[9,] 1.5730205

[10,] 3.9663939

> closeness = 100 - dist

> closeness

[,1]

[1,] 86.01670

[2,] 96.87248

[3,] 92.90157

[4,] 99.87211

[5,] 90.23634

[6,] 92.76814

[7,] 99.75138

[8,] 98.58487

[9,] 98.42698

[10,] 96.03361

> closeness>=90

[,1]

[1,] FALSE

[2,] TRUE

[3,] TRUE

[4,] TRUE

[5,] TRUE

[6,] TRUE

[7,] TRUE

[8,] TRUE

[9,] TRUE

[10,] TRUE

> closeness[closeness>=90]

[1] 96.87248 92.90157 99.87211 90.23634 92.76814 99.75138 98.58487

[8] 98.42698 96.03361

> length(closeness[closeness>=90])

[1] 9

> pcnt = length(closeness[closeness>=90])

> pcnt

[1] 9

> n = length(Y)

> n

[1] 10

> acc = pcnt/n * 100

> acc

[1] 90

Developing function for accuracy testing.

---------------------------------------

> accuracy = function(y,ycap,closeness){

+ de = 100 - closeness

+ dist = abs(y-ycap)/ycap * 100

+ pcnt = length(dist[dist<=de])

+ n = length(y)

+ acc = pcnt/n * 100

+ acc

+ }

> accuracy(Y, ycap, 80)

[1] 100

> accuracy(Y, ycap, 85)

[1] 100

> accuracy(Y, ycap, 90)

[1] 90

> accuracy(Y, ycap, 95)

[1] 60

-------------------------------

Machine Learning Session 16:

Topic : Implemention of Regression models with Python Numpy.

Linear Regression implementation with Python Numpy.

--------------------------------------------

Data file: profiles.txt

"name","age","exp","qual","income"

aaa,21,0,btech,20000

bbbb,22,1,btech,25000

ccc,21,0,mtech,25000

dddd,22,1,mtech,30000

ee,25,4,btech,40000

ffff,25,3.5,mtech,47000

gggg,25,4,mtech,50000

hhh,30,8,btech,80000

jjjj,31,8,mtech,91000

eejj,31,9,mtech,95000

a = []

e = []

q = []

i = []

for line in lines:

w = line.strip().split(',')

age = float(w[1])

exp = float(w[2])

ql=5

if w[3]=='mtech':

ql=8

inc = float(w[-1])/1000

a.append(age)

e.append(exp)

q.append(ql)

i.append(inc)

print(a)

print(e)

print(q)

print(i)

output:

[21.0, 22.0, 21.0, 22.0, 25.0, 25.0, 25.0, 30.0, 31.0, 31.0]

[0.0, 1.0, 0.0, 1.0, 4.0, 3.5, 4.0, 8.0, 8.0, 9.0]

[5, 5, 8, 8, 5, 8, 8, 5, 8, 8]

[20.0, 25.0, 25.0, 30.0, 40.0, 47.0, 50.0, 80.0, 91.0, 95.0]

# preparing input matrix.

import numpy as np

X = np.c_[np.ones(len(a)),a,e,q]

print(X)

[[ 1. 21. 0. 5. ]

[ 1. 22. 1. 5. ]

[ 1. 21. 0. 8. ]

[ 1. 22. 1. 8. ]

[ 1. 25. 4. 5. ]

[ 1. 25. 3.5 8. ]

[ 1. 25. 4. 8. ]

[ 1. 30. 8. 5. ]

[ 1. 31. 8. 8. ]

[ 1. 31. 9. 8. ]]

# output(target) matrix

Y = np.c_[i]

print(Y)

[[20.]

[25.]

[30.]

[40.]

[47.]

[50.]

[80.]

[91.]

[95.]]

def coeffs(x,y):

from numpy.linalg import inv

l = inv(x.T.dot(x))

r = x.T.dot(y)

return l.dot(r)

# coefficients of Linear model.

beta1 = coeffs(X,Y)

print(beta1)

[[-155.36953015]

[ 7.77404719]

[ -1.07864489]

[ 1.93219399]]

# preparing input matrix for quadratic model.

ones = np.ones(len(a))

a = np.array(a)

as = a**2

e = np.array(e)

es = e**2

q = np.array(q)

qs = q**2

XX = np.c_[ones,a,as,e,es,q,qs]

print(XX)

[[ 1. 21. 441. 0. 0. 5. 25. ]

[ 1. 22. 484. 1. 1. 5. 25. ]

[ 1. 21. 441. 0. 0. 8. 64. ]

[ 1. 22. 484. 1. 1. 8. 64. ]

[ 1. 25. 625. 4. 16. 5. 25. ]

[ 1. 25. 625. 3.5 12.25 8. 64. ]

[ 1. 25. 625. 4. 16. 8. 64. ]

[ 1. 30. 900. 8. 64. 5. 25. ]

[ 1. 31. 961. 8. 64. 8. 64. ]

[ 1. 31. 961. 9. 81. 8. 64. ]]

# coefficients for quadratic model.

beta2 = coeffs(XX, Y)

print(beta2)

[[ 4.27008000e+05]

[-3.17610019e+04]

[ 5.43459599e+02]

[ 9.43313010e+03]

[-6.16516751e+02]

[ 0.00000000e+00]

[ 4.00000000e+00]]

# input matrix for cubic model

a3 = a**3

e3 = e**3

q3 = q**3

XXX = np.c_[ones, a,a2,a3,e,es,e3,q,qs,q3]

print(XXX)

# coefficients for cubic model

beta3 = coeffs(XXX, Y)

print(beta3)

[[ 5.46816000e+05]

[-6.54644536e+04]

[ 2.48430214e+03]

[-3.09940604e+01]

[ 2.98121043e+03]

[-6.74362346e+02]

[ 3.80136769e+01]

[ 4.06400000e+03]

[ 1.46000000e+02]

[-4.60000000e+01]]

def predict(x,b):

return x.dot(b)

ycap1 = predict(X, beta1)

ycap2 = predict(XX, beta2)

ycap3 = predict(XXX, beta3)

# predictions by linear model

print(ycap1)

[[17.54643073]

[24.24183303]

[23.3430127 ]

[30.038415 ]

[44.32803993]

[50.66394434]

[50.1246219 ]

[78.88369631]

[92.45432547]

[91.37568058]]

# predictions by quadratic model

print(ycap2)

[[ -207.35603075]

[ 217.01820823]

[ -51.35603075]

[ 373.01820823]

[ 613.45510333]

[-1635.17213331]

[ 769.45510333]

[ -600.44815338]

[ 945.58551178]

[ -102.06914593]]

# predictions by cubic model

print(ycap3)

[[-1176.27672812]

[ -459.63846022]

[-1092.27672812]

[ -375.63846022]

[ 399.22115776]

[ 718.43581996]

[ 483.22115776]

[ 288.18008023]

[ -54.26574259]

[ -288.24732376]]

Accuracy by linear by model 80.0

Accuracy by quadratic by model 0.0

Accuracy by cubic by model 0.0

Linear model is best fit for given data.

--------------------------------

Above are 16 machine Learning Sessions including today.

Bharat Sreeram
bsreeram.datascience@gmail.com
sankara.deva2016@gmail.com
6309613028

------------------------------

Search This Blog

sreeram bharat data science

Machine Learning

Machine Learning Session 15:

Machine Learning Session 16:

Topic : Implemention of Regression models with Python Numpy.

Comments

Post a Comment

Popular posts from this blog

Trainings of Machine Learning, Deep Learning, Artificial Intelligence

NLP for ChatBots : How to Train Simple Neural Networks for Sentiment Analysis

Statistics Basics for Data Science