Machine Learning

Machine Learning By Bharat Sreeram

Bharat Sreeram
bsreeram.datascience@gmail.com
sankara.deva2016@gmail.com
6309613028
--------------------------------------------------------------------



Machine Learning Session 1: Dec-5th-2018

https://global.gotomeeting.com/play/recording/d7c5c19ef6122846296292df6437e4f025f2e1393e60e72bfab1fd4ff97d26e3

Machine Learning Session 2: Dec-6th-2018

https://global.gotomeeting.com/play/recording/07314ce010d0ee64cebf32c399ab3993f05764ecd2c363c81a32e194b610ca6a

notes:

https://drive.google.com/file/d/1aVG9-No4h6fi8GY9QcZmwZe1IiVoL_CV/view?usp=sharing

Machine Learning Session 3: Dec-10th-2018

https://global.gotomeeting.com/play/recording/19cafe83765b2160951e806110e6bbedbc17eaf227341c2477c73e4a400d276d

Machine Learning Session 4: Dec-11th-2018

https://global.gotomeeting.com/play/recording/34593c679b22f3d51f22c1e61ad2c2f752a636c677d387d425cbc96e9f4cb654

notes:

https://drive.google.com/file/d/1dH65bwfH-KgrfTzPNp8hIBy6nBn3SrDj/view?usp=sharing

Machine Learning Session 5: Dec-12th-2018

https://global.gotomeeting.com/play/recording/0b1c3ba66022757756589a1d99abb2276d489108e35fbed78f05ed0b667ecff7

notes:

https://drive.google.com/file/d/1itBDGEH-E6JV9TmGAzjz_TvMhDnmGEME/view?usp=sharing

Machine Learning Session 6: Dec-13th-2018

https://register.gotowebinar.com/recording/8624091316151458310

notes:

https://drive.google.com/file/d/16k7HIybwRHYe7YhdEFNKTKAZAhoMeTDu/view?usp=sharing

Machine Learning Session 7: Dec-18th-2018

https://register.gotowebinar.com/recording/772111545429734659

notes:

https://drive.google.com/file/d/1TQjkFmYqBvhmpbMXJqzIw4_S_lpsc2dl/view?usp=sharing

Machine Learning Session 8:

https://register.gotowebinar.com/recording/410121531262710785

Machine Learning Session 9:

https://register.gotowebinar.com/recording/8589759065729297677

Machine Learning Session 10:

https://register.gotowebinar.com/recording/1690573190596158466

Machine Learning Session 11:

https://register.gotowebinar.com/recording/2632740306457288715

Machine Learning Session 12:

https://register.gotowebinar.com/recording/6611190090724288007

Machine Learning Session 13:

https://register.gotowebinar.com/recording/6446329317299855105

Machine Learning Session 14:

Implementation of  Linear Regression using  R language is statistical approach

Implementation of Linear Regression
   Using R Language.
Data file: profiles.txt
Schema : name,age,exp,qual,income
"name","age","exp","qual","income"
aaa,21,0,btech,20000
bbbb,22,1,btech,25000
ccc,21,0,mtech,25000
dddd,22,1,mtech,30000
ee,25,4,btech,40000
ffff,25,3.5,mtech,47000
gggg,25,4,mtech,50000
hhh,30,8,btech,80000
jjjj,31,8,mtech,91000
eejj,31,9,mtech,95000

task:
construct relationship between age,exp,qual and income.
Y = X.beta
Beta = inv[t(X).X].[t(X).Y]
Using custom function for Beta.(coefficient matrix).
> df = read.csv('D:/mystuff/profiles.txt')
> df
   name age exp  qual income
1   aaa  21 0.0 btech  20000
2  bbbb  22 1.0 btech  25000
3   ccc  21 0.0 mtech  25000
4  dddd  22 1.0 mtech  30000
5    ee  25 4.0 btech  40000
6  ffff  25 3.5 mtech  47000
7  gggg  25 4.0 mtech  50000
8   hhh  30 8.0 btech  80000
9  jjjj  31 8.0 mtech  91000
10 eejj  31 9.0 mtech  95000
> df$age
 [1] 21 22 21 22 25 25 25 30 31 31
> 

> df$q = 5
> df
   name age exp  qual income q
1   aaa  21 0.0 btech  20000 5
2  bbbb  22 1.0 btech  25000 5
3   ccc  21 0.0 mtech  25000 5
4  dddd  22 1.0 mtech  30000 5
5    ee  25 4.0 btech  40000 5
6  ffff  25 3.5 mtech  47000 5
7  gggg  25 4.0 mtech  50000 5
8   hhh  30 8.0 btech  80000 5
9  jjjj  31 8.0 mtech  91000 5
10 eejj  31 9.0 mtech  95000 5

> df$q[df$qual=='mtech']=8
> df
   name age exp  qual income q
1   aaa  21 0.0 btech  20000 5
2  bbbb  22 1.0 btech  25000 5
3   ccc  21 0.0 mtech  25000 8
4  dddd  22 1.0 mtech  30000 8
5    ee  25 4.0 btech  40000 5
6  ffff  25 3.5 mtech  47000 8
7  gggg  25 4.0 mtech  50000 8
8   hhh  30 8.0 btech  80000 5
9  jjjj  31 8.0 mtech  91000 8
10 eejj  31 9.0 mtech  95000 8

> X = data.frame(one=1, a=df$age, e=df$exp, q=df$q)
> X
   one  a   e q
1    1 21 0.0 5
2    1 22 1.0 5
3    1 21 0.0 8
4    1 22 1.0 8
5    1 25 4.0 5
6    1 25 3.5 8
7    1 25 4.0 8
8    1 30 8.0 5
9    1 31 8.0 8
10   1 31 9.0 8
> class(X)
[1] "data.frame"
> 
> class(X)
[1] "data.frame"
> X = data.matrix(X)
> class(X)
[1] "matrix"
> X
      one  a   e q
 [1,]   1 21 0.0 5
 [2,]   1 22 1.0 5
 [3,]   1 21 0.0 8
 [4,]   1 22 1.0 8
 [5,]   1 25 4.0 5
 [6,]   1 25 3.5 8
 [7,]   1 25 4.0 8
 [8,]   1 30 8.0 5
 [9,]   1 31 8.0 8
[10,]   1 31 9.0 8
> dim(X)
[1] 10  4
> 
> Y = matrix(df$income)
> Y
       [,1]
 [1,] 20000
 [2,] 25000
 [3,] 25000
 [4,] 30000
 [5,] 40000
 [6,] 47000
 [7,] 50000
 [8,] 80000
 [9,] 91000
[10,] 95000
> 



> coeffs = function(x,y){
+      l = solve(t(x)%*%x)
+      r = t(x)%*%y
+      l%*%r
+ }
> beta = coeffs(X,Y)
> beta
           [,1]
one -155369.530
a      7774.047
e     -1078.645
q      1932.194
> 
Deriving coefficients using predefined functions:
> lmfit = lm(income ~ age + exp + q , data=df)
> beta = cbind(coefficients(lmfit))
> beta
                   [,1]
(Intercept) -155369.530
age            7774.047
exp           -1078.645
q              1932.194
> 
Quadratic polynomial model(to transform non linear to linear).
 a=df$age
 e = df$exp
 q = df$q
 a2=a^2
 e2 = e^2
 q2 = q^2
i=df$income
beta2 =  cbind(coefficients(lm(i ~a+ a2 + e+e2+q+q2)))
> # deriviving cubic polynomial coefficients.
> a3 = a^3
> e3 = e^3
> q3 = q^3
> beta3 = cbind(coefficients(lm(i ~ a + a2 + a3 +
+                                   e + e2 + e3 +
+                                   q + q2 + q3)))
> 
> beta3
                    [,1]
(Intercept) 2691084.7350
a           -304049.7494
a2            11227.3276
a3             -134.8921
e             15809.6330
e2            -2587.9205
e3              148.3180
q              2222.2222
q2                    NA
q3                    NA
> 
In next class we will see , how to measure accuracy of each model.


Machine Learning Session 15: 

    Developing predict function and testing accuracy. 

> predict = function(x,b){
  ycap =   x%*%b
  ycap
}

input matrix. 
>  X = cbind(1, df$age, df$exp, df$q)


> dim(X)
[1] 10  4
> X
      [,1] [,2] [,3] [,4]
 [1,]    1   21  0.0    5
 [2,]    1   22  1.0    5
 [3,]    1   21  0.0    8
 [4,]    1   22  1.0    8
 [5,]    1   25  4.0    5
 [6,]    1   25  3.5    8
 [7,]    1   25  4.0    8
 [8,]    1   30  8.0    5
 [9,]    1   31  8.0    8
[10,]    1   31  9.0    8


> ycap = predict(X, beta)
> dim(ycap)
[1] 10  1
> ycap
          [,1]
 [1,] 17546.43
 [2,] 24241.83
 [3,] 23343.01
 [4,] 30038.42
 [5,] 44328.04
 [6,] 50663.94
 [7,] 50124.62
 [8,] 78883.70
 [9,] 92454.33
[10,] 91375.68

How to test accuracy of model predictions. 

based on closeness expectation between actual target value
 and predicted value. 

ex:
  a is actual age
  acap is predicted age. 

  then distance between actual and predicted. 

> a=25
> acap=26
> abs(a-acap)/acap * 100
[1] 3.846154

here distance is 3.84

Then closeness between  actual and predicted. 

100 - distance.


> 100 - abs(a-acap)/acap * 100
[1] 96.15385

here closeness 96%, if expected closeness 90%, 
  then above prediction is good. 

lets apply above accuracy measurement on our predictions. 

 ex:   y(actual incomes) and ycap(predicted incomes)

> Y = cbind(df$income)
> Y
       [,1]
 [1,] 20000
 [2,] 25000
 [3,] 25000
 [4,] 30000
 [5,] 40000
 [6,] 47000
 [7,] 50000
 [8,] 80000
 [9,] 91000
[10,] 95000

> ycap
          [,1]
 [1,] 17546.43
 [2,] 24241.83
 [3,] 23343.01
 [4,] 30038.42
 [5,] 44328.04
 [6,] 50663.94
 [7,] 50124.62
 [8,] 78883.70
 [9,] 92454.33
[10,] 91375.68

> dist = abs(Y-ycap)/ycap * 100
> dist
            [,1]
 [1,] 13.9832956
 [2,]  3.1275150
 [3,]  7.0984295
 [4,]  0.1278863
 [5,]  9.7636619
 [6,]  7.2318577
 [7,]  0.2486241
 [8,]  1.4151260
 [9,]  1.5730205
[10,]  3.9663939

> closeness = 100 - dist
> closeness
          [,1]
 [1,] 86.01670
 [2,] 96.87248
 [3,] 92.90157
 [4,] 99.87211
 [5,] 90.23634
 [6,] 92.76814
 [7,] 99.75138
 [8,] 98.58487
 [9,] 98.42698
[10,] 96.03361

> closeness>=90
       [,1]
 [1,] FALSE
 [2,]  TRUE
 [3,]  TRUE
 [4,]  TRUE
 [5,]  TRUE
 [6,]  TRUE
 [7,]  TRUE
 [8,]  TRUE
 [9,]  TRUE
[10,]  TRUE

> closeness[closeness>=90]
[1] 96.87248 92.90157 99.87211 90.23634 92.76814 99.75138 98.58487
[8] 98.42698 96.03361
> length(closeness[closeness>=90])
[1] 9
> pcnt = length(closeness[closeness>=90])
> pcnt
[1] 9
> n = length(Y)
> n
[1] 10
> acc = pcnt/n * 100
> acc
[1] 90

Developing function for accuracy testing. 
---------------------------------------

> accuracy = function(y,ycap,closeness){
+     de = 100 - closeness
+     dist = abs(y-ycap)/ycap * 100
+     pcnt = length(dist[dist<=de])
+     n = length(y)
+     acc = pcnt/n * 100
+     acc
+ }
> accuracy(Y, ycap, 80)
[1] 100
> accuracy(Y, ycap, 85)
[1] 100
> accuracy(Y, ycap, 90)
[1] 90
> accuracy(Y, ycap, 95)
[1] 60

-------------------------------

Machine Learning Session 16:

Topic : Implemention of Regression models  with Python Numpy. 



  Linear Regression implementation with Python Numpy. 
--------------------------------------------

 Data file:  profiles.txt

"name","age","exp","qual","income"
aaa,21,0,btech,20000
bbbb,22,1,btech,25000
ccc,21,0,mtech,25000
dddd,22,1,mtech,30000
ee,25,4,btech,40000
ffff,25,3.5,mtech,47000
gggg,25,4,mtech,50000
hhh,30,8,btech,80000
jjjj,31,8,mtech,91000
eejj,31,9,mtech,95000

a = []
e = []
q = []
i = []
for line in lines:
    w = line.strip().split(',')
    age = float(w[1])
    exp = float(w[2])
    ql=5
    if w[3]=='mtech':
        ql=8
    inc = float(w[-1])/1000
    a.append(age)
    e.append(exp)
    q.append(ql)
    i.append(inc)
print(a)
print(e)
print(q)
print(i)
    
    
output:

[21.0, 22.0, 21.0, 22.0, 25.0, 25.0, 25.0, 30.0, 31.0, 31.0]
[0.0, 1.0, 0.0, 1.0, 4.0, 3.5, 4.0, 8.0, 8.0, 9.0]
[5, 5, 8, 8, 5, 8, 8, 5, 8, 8]
[20.0, 25.0, 25.0, 30.0, 40.0, 47.0, 50.0, 80.0, 91.0, 95.0]


# preparing input matrix. 

import numpy as np
X = np.c_[np.ones(len(a)),a,e,q]
print(X)

[[ 1.  21.   0.   5. ]
 [ 1.  22.   1.   5. ]
 [ 1.  21.   0.   8. ]
 [ 1.  22.   1.   8. ]
 [ 1.  25.   4.   5. ]
 [ 1.  25.   3.5  8. ]
 [ 1.  25.   4.   8. ]
 [ 1.  30.   8.   5. ]
 [ 1.  31.   8.   8. ]
 [ 1.  31.   9.   8. ]]


# output(target) matrix
Y = np.c_[i]
print(Y)

[[20.]
 [25.]
 [25.]
 [30.]
 [40.]
 [47.]
 [50.]
 [80.]
 [91.]
 [95.]]

def coeffs(x,y):
    from numpy.linalg import inv
    l = inv(x.T.dot(x))
    r = x.T.dot(y)
    return l.dot(r)


# coefficients of Linear model. 

beta1 = coeffs(X,Y)
print(beta1)

[[-155.36953015]
 [   7.77404719]
 [  -1.07864489]
 [   1.93219399]]


# preparing input matrix for quadratic model. 

ones = np.ones(len(a))
a = np.array(a)
as = a**2
e = np.array(e)
es = e**2
q = np.array(q)
qs = q**2

XX = np.c_[ones,a,as,e,es,q,qs]
print(XX)

[[  1.    21.   441.     0.     0.     5.    25.  ]
 [  1.    22.   484.     1.     1.     5.    25.  ]
 [  1.    21.   441.     0.     0.     8.    64.  ]
 [  1.    22.   484.     1.     1.     8.    64.  ]
 [  1.    25.   625.     4.    16.     5.    25.  ]
 [  1.    25.   625.     3.5   12.25   8.    64.  ]
 [  1.    25.   625.     4.    16.     8.    64.  ]
 [  1.    30.   900.     8.    64.     5.    25.  ]
 [  1.    31.   961.     8.    64.     8.    64.  ]
 [  1.    31.   961.     9.    81.     8.    64.  ]]


# coefficients for quadratic model. 

beta2 = coeffs(XX, Y)
print(beta2)

[[ 4.27008000e+05]
 [-3.17610019e+04]
 [ 5.43459599e+02]
 [ 9.43313010e+03]
 [-6.16516751e+02]
 [ 0.00000000e+00]
 [ 4.00000000e+00]]


# input matrix for cubic model

a3 = a**3
e3 = e**3
q3 = q**3

XXX = np.c_[ones, a,a2,a3,e,es,e3,q,qs,q3]
print(XXX)


# coefficients  for cubic model

beta3 = coeffs(XXX, Y)
print(beta3)

[[ 5.46816000e+05]
 [-6.54644536e+04]
 [ 2.48430214e+03]
 [-3.09940604e+01]
 [ 2.98121043e+03]
 [-6.74362346e+02]
 [ 3.80136769e+01]
 [ 4.06400000e+03]
 [ 1.46000000e+02]
 [-4.60000000e+01]]


def predict(x,b):
    return x.dot(b)

ycap1 = predict(X, beta1)
ycap2 = predict(XX, beta2)
ycap3 = predict(XXX, beta3)


# predictions by linear model
print(ycap1)

[[17.54643073]
 [24.24183303]
 [23.3430127 ]
 [30.038415  ]
 [44.32803993]
 [50.66394434]
 [50.1246219 ]
 [78.88369631]
 [92.45432547]
 [91.37568058]]

# predictions by quadratic model

print(ycap2)
[[ -207.35603075]
 [  217.01820823]
 [  -51.35603075]
 [  373.01820823]
 [  613.45510333]
 [-1635.17213331]
 [  769.45510333]
 [ -600.44815338]
 [  945.58551178]
 [ -102.06914593]]


# predictions by cubic model

print(ycap3)
[[-1176.27672812]
 [ -459.63846022]
 [-1092.27672812]
 [ -375.63846022]
 [  399.22115776]
 [  718.43581996]
 [  483.22115776]
 [  288.18008023]
 [  -54.26574259]
 [ -288.24732376]]


Accuracy by linear by model  80.0
Accuracy by quadratic by model  0.0
Accuracy by cubic by model  0.0

Linear model is best fit for given data. 
--------------------------------


















Above are 16 machine Learning Sessions including today.



Bharat Sreeram
bsreeram.datascience@gmail.com
sankara.deva2016@gmail.com
6309613028


------------------------------

Comments

Popular posts from this blog

Trainings of Machine Learning, Deep Learning, Artificial Intelligence

NLP for ChatBots : How to Train Simple Neural Networks for Sentiment Analysis