CS231n assignment1(KNN)
实验相关cs231n课程教程:Image Classification
这个实验是,用 KNN
算法,在CIFAR-10
数据集上做图像分类。
KNN模型
train
对于KNN来说,训练只是初始化学习器的训练数据.
1 | def train(self, X, y): |
predict
根据测试数提供测试结果。
这里提供三种不同的计算距离的实现,分别使用0个,1个,2个循环。
第一步:计算每个测试样本到每个训练样本的距离
1 | def predict(self, X, k=1, num_loops=0): |
第二步:根据距离,选出最近的k个样本进行投票
1 | def predict_labels(self, dists, k=1): |
积累几个numpy函数:
np.argsort(a) 返回a中排序后的index;sort(a)返回a中排序后的value
np.argmax(a) 返回a中最大值的index
np.bincount(a) 返回c(注意:a中的元素必须是整数)
for i in range(len(a)): c[i] = 0 for i in a: c[i]++
compute_distances
two_loop
1 | def compute_distances_two_loops(self, X): |
one_loop
1 | def compute_distances_one_loop(self, X): |
no_loop
1 | def compute_distances_no_loops(self, X): |
交叉验证法
原理参考西瓜书吧。西瓜书笔记2
实验目的是,用交叉验证来评估;对knn的k进行调参。
设置折数;k的调参范围;数据集划分;
1
2
3
4
5
6num_folds = 5
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]
# 把数据集均分为num_flods个小batch;
X_train_folds = np.array_split(X_train, num_folds)
Y_train_folds = np.array_split(y_train, num_folds)暴力寻找集合中的最优k
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20k_to_accuracies = {}
for k in k_choices:
k_to_accuracies[k]=[]
for i in range(num_folds):
X_te = X_train_folds[i]
y_te = Y_train_folds[i]
X_tr = np.vstack(X_train_folds[:i] + X_train_folds[i+1:])
y_tr = np.hstack(Y_train_folds[:i] + Y_train_folds[i+1:])
classifier.train(X_tr, y_tr)
for k_now in k_choices:
y_test_pred = classifier.predict(X_te, k=k_now)
num_correct = np.sum(y_test_pred == y_te)
accuracy = float(num_correct) / y_te.shape[0]
k_to_accuracies[k_now].append(accuracy)
# Print out the computed accuracies
for k in sorted(k_to_accuracies):
for accuracy in k_to_accuracies[k]:
print('k = %d, accuracy = %f' % (k, accuracy))
最后的结果,图像表示就是
积累几个numpy的函数:
np.array_split(a,num) 把a均分为num个数组
np.vstack(tup) tup是一个元组,里面每个元素都是一个
np.array
。vstack
就是将元组里的数组纵向合并np.hstack(a) 不同于
vstack
,hstack
是横向合并