《机器学习》 西瓜书的代码实现. 《机器学习》 西瓜书实例 第 6 章

我的博客: https://yunist.cn

《机器学习》西瓜书 第 6 章 编程实例

In [22]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
In [23]:
data = pd.read_csv("work/西瓜数据集3.0α.txt")
data
Out[23]:
Density Sugar content Good melon
0 0.697 0.460
1 0.774 0.376
2 0.634 0.264
3 0.608 0.318
4 0.556 0.215
5 0.403 0.237
6 0.481 0.149
7 0.437 0.211
8 0.666 0.091
9 0.243 0.267
10 0.245 0.057
11 0.343 0.099
12 0.639 0.161
13 0.657 0.198
14 0.360 0.370
15 0.593 0.042
16 0.719 0.103
In [46]:
yes = data[data['Good melon'].isin(['是'])]
no = data[data['Good melon'].isin(['否'])]
fig, ax = plt.subplots(figsize=(12, 8))
ax.scatter(yes['Density'], yes['Sugar content'], marker='o', c='b', label='Yes')
ax.scatter(no['Density'], no['Sugar content'], marker='x', c='r', label='No')
ax.legend()
ax.set_xlabel('Density')
ax.set_ylabel('Sugar content')
plt.show()

没有什么确切的分类界面.

线性核与高斯核

In [24]:
from sklearn import svm
linear_svc = svm.SVC(kernel='linear')
rbf_svc = svm.SVC(kernel='rbf')
In [41]:
temp = {'是': 1, '否': -1}
X = np.array(data.iloc[:, :2])
y = np.array(data.iloc[:, 2].replace(temp))[None].T
In [42]:
linear_svc.fit(X, y)
rbf_svc.fit(X, y)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/utils/validation.py:752: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/utils/validation.py:752: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
Out[42]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

查看支持向量

In [32]:
linear_svc.support_vectors_
Out[32]:
array([[0.666, 0.091],
       [0.243, 0.267],
       [0.343, 0.099],
       [0.639, 0.161],
       [0.657, 0.198],
       [0.36 , 0.37 ],
       [0.593, 0.042],
       [0.719, 0.103],
       [0.697, 0.46 ],
       [0.774, 0.376],
       [0.634, 0.264],
       [0.608, 0.318],
       [0.556, 0.215],
       [0.403, 0.237],
       [0.481, 0.149],
       [0.437, 0.211]])
In [30]:
rbf_svc.support_vectors_
Out[30]:
array([[0.666, 0.091],
       [0.243, 0.267],
       [0.343, 0.099],
       [0.639, 0.161],
       [0.657, 0.198],
       [0.36 , 0.37 ],
       [0.593, 0.042],
       [0.719, 0.103],
       [0.697, 0.46 ],
       [0.774, 0.376],
       [0.634, 0.264],
       [0.608, 0.318],
       [0.556, 0.215],
       [0.403, 0.237],
       [0.481, 0.149],
       [0.437, 0.211]])

SVM 在 iris 数据集上的表现

In [47]:
from sklearn import datasets
iris = datasets.load_iris()
In [50]:
X = iris['data']
y = iris['target'][None].T
In [54]:
linear_svc.fit(X, y)
rbf_svc.fit(X, y)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/utils/validation.py:752: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/utils/validation.py:752: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
Out[54]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
In [58]:
linear_svc.score(X, y), rbf_svc.score(X, y)
Out[58]:
(0.9933333333333333, 0.9866666666666667)

与决策树进行比较

In [59]:
from sklearn import tree
clf = tree.DecisionTreeClassifier()
In [69]:
clf.fit(X, y)
clf.score(X, y)
Out[69]:
1.0

SVR 的训练

这里使用 "密度" 作为输入, "含糖率" 为输出.

In [72]:
rbf_svr = svm.SVR(kernel='rbf')
rbf_svr.fit(np.array(data.iloc[:, 0])[None].T, np.array(data.iloc[:, 1])[None].T)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/utils/validation.py:752: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
Out[72]:
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
  gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True,
  tol=0.001, verbose=False)
In [73]:
rbf_svr.score(np.array(data.iloc[:, 0])[None].T, np.array(data.iloc[:, 1])[None].T)
Out[73]:
0.035749609530696835

挺不错的.