Untitled
INDEX Subject:- CA LAB-VII(A): LAB on Machine Learning Sr.No. Name Of The Practical Date Remark 1 Introduction to Pycharm, Pandas Library, DataFrames, And Loading CSV File in DataFrame. 2 Implement the Find-S Inductive Learning algorithm. 3 Implement the Candidate-Elimination Inductive Learning algorithm. 4 Write program for linear regression and find parameters like Mean Squared Error 5.1 Write a program to implement Decision tree using the Python/R/Programming language of your choice 5.2 Write a program to calculate popular attribute selection measures (ASM) like Information Gain, Gain Ratio, and Gini Index etc. for decision tree. 6 Implement simple KNN using Euclidean distance in python. 7 Write a program to implement k-Nearest Neighbour algorithm to classify the iris dataset. Print both correct and wrong predictions. Java/Python ML library classes can be used for this problem. 8 Write a Program for Confusion Matrix and calculate Precision, Recall, F-Measure 9 Write a program for linear regression and find parameters like Sum of Squared Errors (SSE), Total Sum of Squares (SST), R2, Adjusted R2, etc. 10 Write a program to implement the naïve Bayesian classifier for a sample training dataset stored as a . CSV file. Compute the accuracy of the classifier, considering a few test data sets. 11.1 Implementing Agglomerative Clustering in Python 11.2 Write a Program for Fuzzy c-means clustering in Python. 12 Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select the appropriate data set for your experiment and draw graphs. 13.1 Build a Simple Artificial Neural Network 13.2 Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the same using appropriate data sets. Practical – 1: Introduction to pycharm, Pandas Library, DataFrames, And Loading CSV File in DataFrame import pandas as pd '''pd.__version__''' df1 = pd.DataFrame({"A": [1, 2, 3], "B": [2, 3, 4]}, index=[0, 1, 2]) print("df1:\n", df1) df2 = pd.DataFrame({"B": [4, 5, 7], "C": ["x", "y", "z"]}, index=[4, 5, 6]) print("\ndf2:\n", df2) df3 = df1.combine_first(df2) print("\n combination of df1 and df2:\n", df3) classes = pd.Series(["mathematics", "chemistry", "physics", "history", "geography", "german"]) grades = pd.Series([90, 54, 77, 22, 25, 40]) year = pd. Series([2015, 2016, 2017, 2018, 2019, 2020]) df4 = pd. DataFrame({"Classes": classes, "Grades": grades, "Year": year}) print("\n", df4) # upload a csv file in sample_data section # load the .csv in data frame data_frame = pd.read_csv("C:/Users/sejal/PycharmProjects/dataset.csv") print("\n", data_frame) OUTPUT : C:\Users\sejal\MCA-I_ML\Scripts\python.exe C:/Users/sejal/PycharmProjects/MCA-I_ML/1_prat.py df1: A B 0 1 2 1 2 3 2 3 4 df2: B C 4 4 x 5 5 y 6 7 z combination of df1 and df2: A B C 0 1.0 2 NaN 1 2.0 3 NaN 2 3.0 4 NaN 4 NaN 4 x 5 NaN 5 y 6 NaN 7 z Classes Grades Year 0 mathematics 90 2015 1 chemistry 54 2016 2 physics 77 2017 3 history 22 2018 4 geography 25 2019 5 german 40 2020 sky temp humidity water wind forcast enjoy-sport 0 sunny warm high cool strong same yes 1 sunny warm high warm strong same yes 2 rainy cold low warm weak change no 3 rainy cold high warm weak change no 4 sunny warm high warm strong same yes 5 sunny cold high warm strong same no Practical - 2.: Implement the find-S inductive learning algorithm. import pandas as pd import numpy as np # To read the data in csv file data = pd.read_csv("C:/Users/comp273/Desktop/pract1ML.csv") print("The Data-set For Enjoy Sport Example is:- ") print(data) # Making an array of all the attributes d = np. array(data)[:, :-1] print("\nThe Attributes are :- ") print(d) # Segragating the target that has positive and negative example target = np.array(data)[:, -1] print("\nThe Target is :- ") print(target) # Find S-algorithm - initial and f hypothesis def train(c, t): for i, val in enumerate(t): if val == "yes": sp_hp = d[i].copy() break print("\nInitial Hypothesis:- ") print(sp_hp, "\n") for i, val in enumerate(c): if target[i] == "yes": for x in range(len(sp_hp)): if sp_hp[x] != val[x]: sp_hp[x] = "?" else: pass print("Hypothesis is:- ", i, "= ", sp_hp) return sp_hp print("\nFinal Hypothesis is :- ", train(d, target)) OUTPUT: C:\Users\comp273\PycharmProjects\ML_107\venv\Scripts\python.exe C:/Users/comp273/PycharmProjects/ML_107/find_s_algo.py The Data-set For Enjoy Sport Example is:- Sky AirTemp Humidity Wind Water Forcast EnjoySport 0 sunny warm normal strong warm same yes 1 sunny warm high strong warm same yes 2 sunny cold high strong warm change yes 3 rainy cold normal strong cool change no 4 sunny cold high weak warm change no 5 sunny cold normal weak warm same yes 6 rainy warm high weak cool change no The Attributes are :- [['sunny' 'warm' 'normal' 'strong' 'warm' 'same'] ['sunny' 'warm' 'high' 'strong' 'warm' 'same'] ['sunny' 'cold' 'high' 'strong' 'warm' 'change'] ['rainy' 'cold' 'normal' 'strong' 'cool' 'change'] ['sunny' 'cold' 'high' 'weak' 'warm' 'change'] ['sunny' 'cold' 'normal' 'weak' 'warm' 'same'] ['rainy' 'warm' 'high' 'weak' 'cool' 'change']] The Target is :- ['yes' 'yes' 'yes' 'no' 'no' 'yes' 'no'] Initial Hypothesis:- ['sunny' 'warm' 'normal' 'strong' 'warm' 'same'] Hypothesis is:- 0 = ['sunny' 'warm' 'normal' 'strong' 'warm' 'same'] Hypothesis is:- 1 = ['sunny' 'warm' '?' 'strong' 'warm' 'same'] Hypothesis is:- 2 = ['sunny' '?' '?' 'strong' 'warm' '?'] Hypothesis is:- 3 = ['sunny' '?' '?' 'strong' 'warm' '?'] Hypothesis is:- 4 = ['sunny' '?' '?' 'strong' 'warm' '?'] Hypothesis is:- 5 = ['sunny' '?' '?' '?' 'warm' '?'] Hypothesis is:- 6 = ['sunny' '?' '?' '?' 'warm' '?'] Final Hypothesis is :- ['sunny' '?' '?' '?' 'warm' '?'] Process finished with exit code 0 Practical - 2.: Implement the Candidate-Elimination Inductive Learning algorithm. import numpy as np import pandas as pd data = pd.read_csv("C:/Users/sejal/OneDrive/Desktop/FyMca Sem II Notes/" "Practical Practice/CA LAB-VII(A) ML/Enjoy-sportExample.csv") concepts = np.array(data.iloc[:, 0:-1]) print("\nInstances are:\n", concepts) target = np.array(data.iloc[:, -1]) print("\nTarget Values are: ", target) def learn(concepts, target): specific_h = concepts[0].copy() print("\nInitialization of Specific_Hypothesis and General_Hypothesis") print("\nSpecific Boundary: ", specific_h) general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))] print("\nGeneric Boundary: ", general_h) for i, h in enumerate(concepts): print("Instance", i+1, "is ", h) if target[i] == "yes": print("Instance is Positive ") for x in range(len(specific_h)): if h[x] != specific_h[x]: specific_h[x] = '?' general_h[x][x] = '?' else: print("Instance is Negative ") for x in range(len(specific_h)): if h[x] != specific_h[x] and specific_h[x] != '?': general_h[x][x] = specific_h[x] else: general_h[x][x] = '?' print("Specific Boundary after ", i+1, "Instance is ", specific_h) print("Generic Boundary after ", i+1, "Instance is ", general_h) print("\n") indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']] for i in indices: general_h.remove(['?', '?', '?', '?', '?', '?']) return specific_h, general_h s_final, g_final = learn(concepts, target) print("Final Specific_Hypothesis: ", s_final, sep="\n") print("Final General_Hypothesis: ", g_final, sep="\n") OUTPUT: C:\Users\sejal\MCA-I_ML\Scripts\python.exe C:/Users/sejal/PycharmProjects/MCA-I_ML/candidate_elimination.py Instances are: [['sunny' 'warm' 'normal' 'strong' 'warm' 'same'] ['sunny' 'warm' 'high' 'strong' 'warm' 'same'] ['sunny' 'cold' 'high' 'strong' 'warm' 'change'] ['rainy' 'cold' 'normal' 'strong' 'cool' 'change'] ['sunny' 'cold' 'high' 'weak' 'warm' 'change'] ['sunny' 'cold' 'normal' 'weak' 'warm' 'same'] ['rainy' 'warm' 'high' 'weak' 'cool' 'change']] Target Values are: ['yes' 'yes' 'yes' 'no' 'no' 'yes' 'no'] Initialization of Specific_Hypothesis and General_Hypothesis Specific Boundary: ['sunny' 'warm' 'normal' 'strong' 'warm' 'same'] Generic Boundary: [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']] Instance 1 is ['sunny' 'warm' 'normal' 'strong' 'warm' 'same'] Instance is Positive Specific Boundary after 1 Instance is ['sunny' 'warm' 'normal' 'strong' 'warm' 'same'] Generic Boundary after 1 Instance is [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']] Instance 2 is ['sunny' 'warm' 'high' 'strong' 'warm' 'same'] Instance is Positive Specific Boundary after 2 Instance is ['sunny' 'warm' '?' 'strong' 'warm' 'same'] Generic Boundary after 2 Instance is [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']] Instance 3 is ['sunny' 'cold' 'high' 'strong' 'warm' 'change'] Instance is Positive Specific Boundary after 3 Instance is ['sunny' '?' '?' 'strong' 'warm' '?'] Generic Boundary after 3 Instance is [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']] Instance 4 is ['rainy' 'cold' 'normal' 'strong' 'cool' 'change'] Instance is Negative Specific Boundary after 4 Instance is ['sunny' '?' '?' 'strong' 'warm' '?'] Generic Boundary after 4 Instance is [['sunny', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', 'warm', '?'], ['?', '?', '?', '?', '?', '?']] Instance 5 is ['sunny' 'cold' 'high' 'weak' 'warm' 'change'] Instance is Negative Specific Boundary after 5 Instance is ['sunny' '?' '?' 'strong' 'warm' '?'] Generic Boundary after 5 Instance is [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', 'strong', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']] Instance 6 is ['sunny' 'cold' 'normal' 'weak' 'warm' 'same'] Instance is Positive Specific Boundary after 6 Instance is ['sunny' '?' '?' '?' 'warm' '?'] Generic Boundary after 6 Instance is [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']] Instance 7 is ['rainy' 'warm' 'high' 'weak' 'cool' 'change'] Instance is Negative Specific Boundary after 7 Instance is ['sunny' '?' '?' '?' 'warm' '?'] Generic Boundary after 7 Instance is [['sunny', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', 'warm', '?'], ['?', '?', '?', '?', '?', '?']] Final Specific_Hypothesis: ['sunny' '?' '?' '?' 'warm' '?'] Final General_Hypothesis: [['sunny', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', 'warm', '?']] Process finished with exit code 0 Practical - 4.: Finding the Estimated coefficient and regression coeficiant import numpy as np def estimated_coef(x, y): # number of observation\points n = np.size(x) # mean of x and y vector m_x = np.mean(x) m_y = np.mean(y) # calculating cross deviation and deviation about x ss_xy = np.sum(y * x) - n * m_y * m_x ss_xx = np.sum(x * x) - n * m_x * m_x # calculating regression coefficients b_1 = ss_xy / ss_xx b_0 = m_y - b_1 * m_x return (b_0, b_1) def main(): # observations/data x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12, 15]) # estimating coefficients b = estimated_coef(x, y) print("Estimated coefficients :-\n b_0 = {} \n b_1 = {}".format(b[0], b[1])) y_pred = b[0] + b[1] * x print("x input :", x) print("original y : ", y_pred) e = y - y_pred merror = np.sum(e*e) n = np.size(x) print("mean square error = ", merror/(2 * n)) if __name__ == "__main__": main() OUTPUT: C:\Users\comp\mca107\venv\Scripts\python.exe C:/Users/comp/mca107/ml_pract4.py Estimated coefficients :- b_0 = 0.9545454545454541 b_1 = 1.2636363636363637 x input : [ 0 1 2 3 4 5 6 7 8 9 10] original y : [ 0.95454545 2.21818182 3.48181818 4.74545455 6.00909091 7.27272727 8.53636364 9.8 11.06363636 12.32727273 13.59090909] mean square error = 0.38801652892561994 Practical - 5.1: Write a program to implement Decision tree using the Python/R/Programming language of your choice import matplotlib.pyplot as plt import pandas as pd from sklearn.datasets import load_iris # load_iris data_b = load_iris() # lo df = pd.DataFrame(data_b.data, columns=data_b.feature_names) df['target'] = data_b.target # df['target'] print(df) print("Dataset Labels=", data_b.target_names) from sklearn.tree import DecisionTreeClassifier from sklearn import metrics from sklearn.model_selection import train_test_split # import numpy as np from sklearn import tree X_train, X_test, Y_train, y_test = train_test_split(df[data_b.feature_names], df['target'], random_state=1) print(X_train) print(X_test) print(Y_train) print(y_test) clf = DecisionTreeClassifier(max_depth=5, random_state=1, criterion='gini') # 'gini'/'entropy' clf.fit(X_train, Y_train) y_pred = clf.predict(X_test) print(y_test, y_pred) print("Accuracy: ", metrics.accuracy_score(y_test, y_pred)) # tree.plot_tree(clf) fn = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] cn = ['setosa', 'versicolor', 'virginica'] fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(4, 4), dpi=300) tree.plot_tree(clf, feature_names=fn, class_names=cn, filled=True); fig.savefig('Dicision_tree.png') OUTPUT: C:\Users\sejal\MCA-I_ML\Scripts\python.exe C:/Users/sejal/PycharmProjects/MCA-I_ML/Decision_tree.py sepal length (cm) sepal width (cm) ... petal width (cm) target 0 5.1 3.5 ... 0.2 0 1 4.9 3.0 ... 0.2 0 2 4.7 3.2 ... 0.2 0 3 4.6 3.1 ... 0.2 0 4 5.0 3.6 ... 0.2 0 .. ... ... ... ... ... 145 6.7 3.0 ... 2.3 2 146 6.3 2.5 ... 1.9 2 147 6.5 3.0 ... 2.0 2 148 6.2 3.4 ... 2.3 2 149 5.9 3.0 ... 1.8 2 [150 rows x 5 columns] Dataset Labels= ['setosa' 'versicolor' 'virginica'] sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 54 6.5 2.8 4.6 1.5 108 6.7 2.5 5.8 1.8 112 6.8 3.0 5.5 2.1 17 5.1 3.5 1.4 0.3 119 6.0 2.2 5.0 1.5 .. ... ... ... ... 133 6.3 2.8 5.1 1.5 137 6.4 3.1 5.5 1.8 72 6.3 2.5 4.9 1.5 140 6.7 3.1 5.6 2.4 37 4.9 3.6 1.4 0.1 [112 rows x 4 columns] sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 14 5.8 4.0 1.2 0.2 98 5.1 2.5 3.0 1.1 75 6.6 3.0 4.4 1.4 16 5.4 3.9 1.3 0.4 131 7.9 3.8 6.4 2.0 56 6.3 3.3 4.7 1.6 141 6.9 3.1 5.1 2.3 44 5.1 3.8 1.9 0.4 29 4.7 3.2 1.6 0.2 120 6.9 3.2 5.7 2.3 94 5.6 2.7 4.2 1.3 5 5.4 3.9 1.7 0.4 102 7.1 3.0 5.9 2.1 51 6.4 3.2 4.5 1.5 78 6.0 2.9 4.5 1.5 42 4.4 3.2 1.3 0.2 92 5.8 2.6 4.0 1.2 66 5.6 3.0 4.5 1.5 31 5.4 3.4 1.5 0.4 35 5.0 3.2 1.2 0.2 90 5.5 2.6 4.4 1.2 84 5.4 3.0 4.5 1.5 77 6.7 3.0 5.0 1.7 40 5.0 3.5 1.3 0.3 125 7.2 3.2 6.0 1.8 99 5.7 2.8 4.1 1.3 33 5.5 4.2 1.4 0.2 19 5.1 3.8 1.5 0.3 73 6.1 2.8 4.7 1.2 146 6.3 2.5 5.0 1.9 91 6.1 3.0 4.6 1.4 135 7.7 3.0 6.1 2.3 69 5.6 2.5 3.9 1.1 128 6.4 2.8 5.6 2.1 114 5.8 2.8 5.1 2.4 48 5.3 3.7 1.5 0.2 53 5.5 2.3 4.0 1.3 28 5.2 3.4 1.4 0.2 54 1 108 2 112 2 17 0 119 2 .. 133 2 137 2 72 1 140 2 37 0 Name: target, Length: 112, dtype: int32 14 0 98 1 75 1 16 0 131 2 56 1 141 2 44 0 29 0 120 2 94 1 5 0 102 2 51 1 78 1 42 0 92 1 66 1 31 0 35 0 90 1 84 1 77 1 40 0 125 2 99 1 33 0 19 0 73 1 146 2 91 1 135 2 69 1 128 2 114 2 48 0 53 1 28 0 Name: target, dtype: int32 14 0 98 1 75 1 16 0 131 2 56 1 141 2 44 0 29 0 120 2 94 1 5 0 102 2 51 1 78 1 42 0 92 1 66 1 31 0 35 0 90 1 84 1 77 1 40 0 125 2 99 1 33 0 19 0 73 1 146 2 91 1 135 2 69 1 128 2 114 2 48 0 53 1 28 0 Name: target, dtype: int32 [0 1 1 0 2 1 2 0 0 2 1 0 2 1 1 0 1 1 0 0 1 1 2 0 2 1 0 0 1 2 1 2 1 2 2 0 1 0] Accuracy: 0.9736842105263158 Practical – 5.2 : Write a program to calculate popular attribute selection measures (ASM) like Information Gain, Gain Ratio, and Gini Index etc. for decision tree. Practical No: 6 Practical Name: Implement simple KNN using Euclidean distance in Python. ------------------------------------------------------------------------------------------------ Code: KNN using Euclidean distance from pandas import DataFrame from sklearn.datasets import load_iris data_b = load_iris() df= DataFrame(data_b.data, columns=data_b.feature_names) df['target'] = data_b.target #print(df) #print(data_b.DESCR) print("Dataset Labels=",data_b.target_names) from sklearn.neighbors import KNeighborsClassifier from sklearn import metrics from sklearn.metrics import confusion_matrix from sklearn.model_selection import train_test_split X_train, X_test, Y_train, y_test = train_test_split(df[data_b.feature_names], df['target'], random_state=1) print(X_train.head(6)) print(Y_train.head(6)) print(X_test.head()) clf = KNeighborsClassifier(n_neighbors=6) clf.fit(X_train, Y_train) # model is trained y_pred=clf.predict(X_test) #print(y_test, y_pred) print("Accuracy:",metrics.accuracy_score(y_test, y_pred)) cm = confusion_matrix(y_test, y_pred) print("Confusion Matrix:") print(cm) OUTPUT : C:\Users\sejal\MCA-I_ML\Scripts\python.exe C:/Users/sejal/PycharmProjects/MCA-I_ML/KNN.py Dataset Labels= ['setosa' 'versicolor' 'virginica'] sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 54 6.5 2.8 4.6 1.5 108 6.7 2.5 5.8 1.8 112 6.8 3.0 5.5 2.1 17 5.1 3.5 1.4 0.3 119 6.0 2.2 5.0 1.5 103 6.3 2.9 5.6 1.8 54 1 108 2 112 2 17 0 119 2 103 2 Name: target, dtype: int32 sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 14 5.8 4.0 1.2 0.2 98 5.1 2.5 3.0 1.1 75 6.6 3.0 4.4 1.4 16 5.4 3.9 1.3 0.4 131 7.9 3.8 6.4 2.0 Accuracy: 1.0 Confusion Matrix: [[13 0 0] [ 0 16 0] [ 0 0 9]] Process finished with exit code 0 #################################################################### Code: For Breast Cancer Data Set from pandas import DataFrame #from sklearn.datasets import load_iris from sklearn.datasets import load_breast_cancer from sklearn.neighbors import KNeighborsClassifier from sklearn import metrics from sklearn.metrics import confusion_matrix from sklearn.model_selection import train_test_split #data_b = load_iris() data_b = load_breast_cancer() df = DataFrame(data_b.data, columns=data_b.feature_names) df['target'] = data_b.target # print(df) # print(data_b.DESCR) print("Dataset Labels=", data_b.target_names) X_train, X_test, Y_train, y_test = train_test_split(df[data_b.feature_names], df['target'], random_state=1) print(X_train.head(6)) print(Y_train.head(6)) print(X_test.head()) clf = KNeighborsClassifier(n_neighbors=6) clf.fit(X_train, Y_train) # model is trained y_pred = clf.predict(X_test) # print(y_test, y_pred) print("Accuracy:", metrics.accuracy_score(y_test, y_pred)) cm = confusion_matrix(y_test, y_pred) OUTPUT: C:\Users\sejal\MCA-I_ML\Scripts\python.exe C:/Users/sejal/PycharmProjects/MCA-I_ML/KNN.py Dataset Labels= ['malignant' 'benign'] mean radius mean texture ... worst symmetry worst fractal dimension 562 15.22 30.62 ... 0.4089 0.14090 291 14.96 19.10 ... 0.2962 0.08472 16 14.68 20.13 ... 0.3029 0.08216 546 10.32 16.35 ... 0.2681 0.07399 293 11.85 17.46 ... 0.3101 0.07007 350 11.66 17.07 ... 0.2731 0.06825 [6 rows x 30 columns] 562 0 291 1 16 0 546 1 293 1 350 1 Name: target, dtype: int32 mean radius mean texture ... worst symmetry worst fractal dimension 421 14.69 13.98 ... 0.2827 0.09208 47 13.17 18.66 ... 0.3900 0.11790 292 12.95 16.02 ... 0.3380 0.09584 186 18.31 18.58 ... 0.3206 0.06938 414 15.13 29.81 ... 0.3233 0.06165 [5 rows x 30 columns] Accuracy: 0.9370629370629371 Confusion Matrix: [[51 4] [ 5 83]] Number of correct predictions= 134 Number of wrong predictions = 9 Process finished with exit code 0 Practical No: 7 Practical Name: Write a program to implement the k-Nearest Neighbour algorithm to classify the iris dataset. Print both correct and wrong predictions. Java/Python ML library classes can be used for this problem Code: For Iris Data Set from pandas import DataFrame from sklearn.datasets import load_iris #from sklearn.datasets import load_breast_cancer from sklearn.neighbors import KNeighborsClassifier from sklearn import metrics from sklearn.metrics import confusion_matrix from sklearn.model_selection import train_test_split data_b = load_iris() #data_b = load_breast_cancer() df = DataFrame(data_b.data, columns=data_b.feature_names) df['target'] = data_b.target # print(df) # print(data_b.DESCR) print("Dataset Labels=", data_b.target_names) X_train, X_test, Y_train, y_test = train_test_split(df[data_b.feature_names], df['target'], random_state=1) print(X_train.head(6)) print(Y_train.head(6)) print(X_test.head()) clf = KNeighborsClassifier(n_neighbors=6) clf.fit(X_train, Y_train) # model is trained y_pred = clf.predict(X_test) # print(y_test, y_pred) print("Accuracy:", metrics.accuracy_score(y_test, y_pred)) cm = confusion_matrix(y_test, y_pred) print("Confusion Matrix:") print(cm) # corr = cm[0, 0] + cm[1, 1] + cm[2, 2] # ----for iris # corr = cm[0, 0] + cm[1, 1] #----for breast cancer corr = 0 for i in range(len(data_b.target_names)): corr = corr + cm[i, i] wrg = len(y_test) - corr print("Number of correct predictions=", corr) print("Number of wrong predictions = ", wrg) OUTPUT: C:\Users\sejal\MCA-I_ML\Scripts\python.exe C:/Users/sejal/PycharmProjects/MCA-I_ML/KNN.py Dataset Labels= ['setosa' 'versicolor' 'virginica'] sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 54 6.5 2.8 4.6 1.5 108 6.7 2.5 5.8 1.8 112 6.8 3.0 5.5 2.1 17 5.1 3.5 1.4 0.3 119 6.0 2.2 5.0 1.5 103 6.3 2.9 5.6 1.8 54 1 108 2 112 2 17 0 119 2 103 2 Name: target, dtype: int32 sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 14 5.8 4.0 1.2 0.2 98 5.1 2.5 3.0 1.1 75 6.6 3.0 4.4 1.4 16 5.4 3.9 1.3 0.4 131 7.9 3.8 6.4 2.0 Accuracy: 1.0 Confusion Matrix: [[13 0 0] [ 0 16 0] [ 0 0 9]] Number of correct predictions= 38 Number of wrong predictions = 0 Process finished with exit code 0 #################################################################### Code: For Breast Cancer Data Set from pandas import DataFrame #from sklearn.datasets import load_iris from sklearn.datasets import load_breast_cancer from sklearn.neighbors import KNeighborsClassifier from sklearn import metrics from sklearn.metrics import confusion_matrix from sklearn.model_selection import train_test_split #data_b = load_iris() data_b = load_breast_cancer() df = DataFrame(data_b.data, columns=data_b.feature_names) df['target'] = data_b.target # print(df) # print(data_b.DESCR) print("Dataset Labels=", data_b.target_names) X_train, X_test, Y_train, y_test = train_test_split(df[data_b.feature_names], df['target'], random_state=1) print(X_train.head(6)) print(Y_train.head(6)) print(X_test.head()) clf = KNeighborsClassifier(n_neighbors=6) clf.fit(X_train, Y_train) # model is trained y_pred = clf.predict(X_test) # print(y_test, y_pred) print("Accuracy:", metrics.accuracy_score(y_test, y_pred)) cm = confusion_matrix(y_test, y_pred) print("Confusion Matrix:") print(cm) # corr = cm[0, 0] + cm[1, 1] + cm[2, 2] # ----for iris # corr = cm[0, 0] + cm[1, 1] #----for breast cancer corr = 0 for i in range(len(data_b.target_names)): corr = corr + cm[i, i] wrg = len(y_test) - corr print("Number of correct predictions=", corr) print("Number of wrong predictions = ", wrg) OUTPUT: C:\Users\sejal\MCA-I_ML\Scripts\python.exe C:/Users/sejal/PycharmProjects/MCA-I_ML/KNN.py Dataset Labels= ['malignant' 'benign'] mean radius mean texture ... worst symmetry worst fractal dimension 562 15.22 30.62 ... 0.4089 0.14090 291 14.96 19.10 ... 0.2962 0.08472 16 14.68 20.13 ... 0.3029 0.08216 546 10.32 16.35 ... 0.2681 0.07399 293 11.85 17.46 ... 0.3101 0.07007 350 11.66 17.07 ... 0.2731 0.06825 [6 rows x 30 columns] 562 0 291 1 16 0 546 1 293 1 350 1 Name: target, dtype: int32 mean radius mean texture ... worst symmetry worst fractal dimension 421 14.69 13.98 ... 0.2827 0.09208 47 13.17 18.66 ... 0.3900 0.11790 292 12.95 16.02 ... 0.3380 0.09584 186 18.31 18.58 ... 0.3206 0.06938 414 15.13 29.81 ... 0.3233 0.06165 [5 rows x 30 columns] Accuracy: 0.9370629370629371 Confusion Matrix: [[51 4] [ 5 83]] Number of correct predictions= 134 Number of wrong predictions = 9 Process finished with exit code 0 Practical No.: 8 Practical Name: Write a Program for Confusion Matrix and calculate Precision, Recall, F-Measure from sklearn.datasets import load_iris, load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score # Load the Irish dataset iris = load_iris() X_iris = iris.data y_iris = iris.target # Split the Irish dataset into training and testing sets X_train_iris, X_test_iris, y_train_iris, y_test_iris = train_test_split(X_iris, y_iris, test_size=0.2, random_state=42) # Train the KNN classifier on the Irish d3ataset knn_iris = KNeighborsClassifier() knn_iris.fit(X_train_iris, y_train_iris) # Make predictions on the Irish testing set y_pred_iris = knn_iris.predict(X_test_iris) # Calculate the confusion matrix for Irish dataset cm_iris = confusion_matrix(y_test_iris, y_pred_iris) print("Confusion Matrix (Irish Dataset):") print(cm_iris) # Calculate precision, recall, and F-measure for Irish dataset precision_iris = precision_score(y_test_iris, y_pred_iris, average='macro') recall_iris = recall_score(y_test_iris, y_pred_iris, average='macro') f1_iris = f1_score(y_test_iris, y_pred_iris, average='macro') print("Precision (Irish Dataset):", precision_iris) print("Recall (Irish Dataset):", recall_iris) print("F-measure (Irish Dataset):", f1_iris) # Load the Breast Cancer dataset cancer = load_breast_cancer() X_cancer = cancer.data y_cancer = cancer.target # Split the Breast Cancer dataset into training and testing sets X_train_cancer, X_test_cancer, y_train_cancer, y_test_cancer = train_test_split(X_cancer, y_cancer, test_size=0.2, random_state=42) # Train the KNN classifier on the Breast Cancer dataset knn_cancer = KNeighborsClassifier() knn_cancer.fit(X_train_cancer, y_train_cancer) # Make predictions on the Breast Cancer testing set y_pred_cancer = knn_cancer.predict(X_test_cancer) # Calculate the confusion matrix for Breast Cancer dataset cm_cancer = confusion_matrix(y_test_cancer, y_pred_cancer) print("\nConfusion Matrix (Breast Cancer Dataset):") print(cm_cancer) # Calculate precision, recall, and F-measure for Breast Cancer dataset precision_cancer = precision_score(y_test_cancer, y_pred_cancer) recall_cancer = recall_score(y_test_cancer, y_pred_cancer) f1_cancer = f1_score(y_test_cancer, y_pred_cancer) print("Precision (Breast Cancer Dataset):", precision_cancer) print("Recall (Irish Dataset):", recall_cancer) print("F-measure (Irish Dataset):", f1_cancer) OUTPUT: Confusion Matrix (Irish Dataset): [[10 0 0] [ 0 9 0] [ 0 0 11]] Precision (Irish Dataset): 1.0 Recall (Irish Dataset): 1.0 F-measure (Irish Dataset): 1.0 Confusion Matrix (Breast Cancer Dataset): [[38 5] [ 0 71]] Precision (Breast Cancer Dataset): 0.9342105263157895 Recall (Irish Dataset): 1.0 F-measure (Irish Dataset): 0.9659863945578232 Practical No.: 9 Practical Name: Write a program for linear regression and find parameters like Sum of Squared Errors (SSE), Total Sum of Squares (SST), R2, Adjusted R2, etc. import numpy as np from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score # Input data X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]]) y = np.array([3, 4, 5, 6]) model = LinearRegression() # Create a linear regression model model.fit(X, y) # Fit the model to the data y_pred = model.predict(X) # Predict the output sse = np.sum((y_pred - y) ** 2) # Calculate SSE (Sum of Squared Errors) sst = np.sum((y - np.mean(y)) ** 2) # Calculate SST (Total Sum of Squares) r2 = r2_score(y, y_pred) # Calculate R2 score # Calculate adjusted R2 n = X.shape[0] # Number of samples p = X.shape[1] # Number of predictors adjusted_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1) # Print the results print("Sum of Squared Errors(SSE):- ", sse) print("Total Sum of Squares(SST):- ", sst) print("R Square(R2):- ", r2) print("Adjusted Square(R2):- ", adjusted_r2 ) OUTPUT: Sum of Squared Errors(SSE):- 0.0 Total Sum of Squares(SST):- 5.0 R Square(R2):- 1.0 Adjusted Square(R2):- 1.0 Practical – 10: Write a program to implement the naïve Bayesian classifier for a sample training dataset stored as a . CSV file. Compute the accuracy of the classifier, considering a few test data sets. from sklearn.model_selection import train_test_split from sklearn.naive_bayes import GaussianNB from sklearn.metrics import confusion_matrix from sklearn import datasets iris = datasets.load_iris() # loading dataset x = iris.data[:, ] # input y = iris.target # target print("Features : ", iris['feature_names']) x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=0) NB = GaussianNB() NB.fit(x_train, y_train) Y_pred = NB.predict(x_test) cm = confusion_matrix(y_test, Y_pred) print("Confusion Matrix:- ", cm) OUTPUT: C:\Users\sejal\MCA-I_ML\Scripts\python.exe C:/Users/sejal/PycharmProjects/MCA-I_ML/Naive_bays_short.py Features : ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] Confusion Matrix:- [[13 0 0] [ 0 16 0] [ 0 0 9]] Process finished with exit code 0 Practical – 11.1: Implementing Agglomerative Clustering in Python. Practical – 11.2: Write a Program for Fuzzy c-means clustering in Python. Practical – 12: Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select the appropriate data set for your experiment and draw graphs. Practical - 13.1: Construction Of simple Neural Network using Python Code: import numpy as np from scipy.special import expit as activation_function from scipy.stats import truncnorm # define the network # generate numbers within a truncated (bounded) # normal Distribution def truncated_normal(mean=0, sd=1, low=0, upp=10): return truncnorm((low - mean) / sd, (upp - mean) / sd, loc=mean, scale=sd) # creat the Network class and define the arguments: # set the no. of neurons/nodes for each layer # and initialize the weight matrices class Nnetwork: def __init__(self, no_of_in_nodes, no_of_out_nodes, no_of_hidden_nodes, learning_rate): self.no_of_in_nodes = no_of_in_nodes self.no_of_out_nodes = no_of_out_nodes self.no_of_hidden_nodes = no_of_hidden_nodes self.learning_rate = learning_rate self.create_weight_matrices() def create_weight_matrices(self): """A method to initialize the weight matrices of the neural network""" rad = 1 / np.sqrt(self.no_of_in_nodes) # rad = 0.2707 x = truncated_normal(mean=0, sd=1, low=-rad, upp=rad) self.weight_in_hidden = x.rvs((self.no_of_hidden_nodes, self.no_of_in_nodes)) print("weights_in_hidden = ", self.weight_in_hidden) rad = 1/np.sqrt(self.no_of_hidden_nodes) x = truncated_normal(mean=0, sd=1, low=-rad, upp=rad) self.weight_in_hidden_out = x.rvs((self.no_of_out_nodes, self.no_of_hidden_nodes)) print("weights_in_hidden_out = ", self.weight_in_hidden_out) def train(self, input_vector, target_vector): pass def run(self, input_vector): input_vector = np.array(input_vector, ndmin=2).T print("Input = ", input_vector) input_hidden = activation_function(self.weight_in_hidden @ input_vector) print("Hidden = ", input_hidden) output_vector = activation_function(self.weight_in_hidden_out @ input_hidden) print("Output = ", output_vector) return output_vector simple_network = Nnetwork(no_of_in_nodes=2, no_of_out_nodes=2, no_of_hidden_nodes=4, learning_rate=0.6) #run simple network for arrays, lists and tuples with shape (2): y = simple_network.run([2,3]) print("Y = ", y) OUTPUT”: weights_in_hidden = [[-0.68798443 0.29428266] [ 0.57363879 -0.64646032] [-0.38809421 0.07104818] [-0.23288421 0.26427463]] weights_in_hidden_out = [[ 0.12718945 -0.15067287 -0.36574728 0.3725497 ] [-0.09102931 -0.22077172 0.40025881 -0.32163589]] Input = [[2] [3]] Hidden = [[0.37915865] [0.31171721] [0.36284346] [0.58104275]] Output = [[0.52124119] [0.46381691]] Y = [[0.52124119] [0.46381691]] Practical No - 13.2: Classification Of Iris Dataset By Applying Artificial Neural Network With Back-Propagation Algorithm # Classification of iris data set by applying artificial neural network using Back-propagation algorithm import numpy as np import pandas as pd from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt # load dataset data = load_iris() # Get features and target x = data.data y = data.target print("Y=", y) y = pd.get_dummies(y).values print(y[:3]) # split data into train and test data x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=20, random_state=4) # initialize variable learning_rate = 0.1 iteration = 6000 N = y_train.size # number of input features input_size = 4 # number of hidden layers neurons hidden_size = 2 # mo. of neurons at output layers output_size = 3 results = pd.DataFrame(columns=["mse", "accuracy"]) # initialize weights np.random.seed(10) # initialiizing weight for the hidden layers W1 = np.random.normal(scale=0.5, size=(input_size, hidden_size)) print("weight 1", W1) # initializing weight for the output layers W2 = np.random.normal(scale=0.5, size=(hidden_size, output_size)) print("weight 2", W2) def sigmoid(x): return 1/(1 + np.exp(-x)) def mean_squared_error(y_pred, y_true): return (((y_pred - y_true) ** 2).sum()) / (2 * y_pred.size) def accuracy(y_pred, y_true): acc = y_pred.argmax(axis=1) == y_true.argmax(axis=1) return acc.mean() for itr in range(iteration): # feedforward propagation # on hidden layer Z1 = np.dot(x_train, W1) A1 = sigmoid(Z1) # on output layer Z2 = np.dot(A1, W2) A2 = sigmoid(Z2) # calculating error mse = mean_squared_error(A2, y_train) acc = accuracy(A2, y_train) results = results._append({"mse": mse, "accuracy": acc}, ignore_index=True) # backpropagation E1 = A2 - y_train dw1 = E1 * A2 * (1 - A2) E2 = np.dot(dw1, W2.T) dw2 = E2 * A1 * (1 - A1) # weight updates W2_update = np.dot(A1.T, dw1) / N W1_update = np.dot(x_train.T, dw2) / N W2 = W2 - learning_rate * W2_update W1 = W1 - learning_rate * W1_update results.mse.plot(title="Mean squared Error") results.accuracy.plot(title="Accuracy") # feedforward Z1 = np.dot(x_test, W1) A1 = sigmoid(Z1) Z2 = np.dot(A1, W2) A2 = sigmoid(Z2) acc = accuracy(A2, y_test) print("Accuracy: {}".format(acc)) OUTPUT: C:\Users\sejal\MCA-I_ML\Scripts\python.exe C:/Users/sejal/PycharmProjects/MCA-I_ML/nural_network_Backpropa_algo.py Y= [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2] [[ True False False] [ True False False] [ True False False]] weight 1 [[ 0.66579325 0.35763949] [-0.77270015 -0.00419192] [ 0.31066799 -0.36004278] [ 0.13275579 0.05427426]] weight 2 [[ 0.00214572 -0.08730011 0.21651309] [ 0.60151869 -0.48253284 0.51413704]] -----------------------------------------------------XXX-----------------------------------------------------
Leave a Comment