我的博客: https://yunist.cn ¶

《机器学习》西瓜书第 6 章编程实例¶

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv("work/西瓜数据集3.0α.txt")
data

yes = data[data['Good melon'].isin(['是'])]
no = data[data['Good melon'].isin(['否'])]
fig, ax = plt.subplots(figsize=(12, 8))
ax.scatter(yes['Density'], yes['Sugar content'], marker='o', c='b', label='Yes')
ax.scatter(no['Density'], no['Sugar content'], marker='x', c='r', label='No')
ax.legend()
ax.set_xlabel('Density')
ax.set_ylabel('Sugar content')
plt.show()

没有什么确切的分类界面.

线性核与高斯核¶

from sklearn import svm
linear_svc = svm.SVC(kernel='linear')
rbf_svc = svm.SVC(kernel='rbf')

temp = {'是': 1, '否': -1}
X = np.array(data.iloc[:, :2])
y = np.array(data.iloc[:, 2].replace(temp))[None].T

linear_svc.fit(X, y)
rbf_svc.fit(X, y)

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/utils/validation.py:752: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/utils/validation.py:752: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

查看支持向量¶

linear_svc.support_vectors_

array([[0.666, 0.091],
       [0.243, 0.267],
       [0.343, 0.099],
       [0.639, 0.161],
       [0.657, 0.198],
       [0.36 , 0.37 ],
       [0.593, 0.042],
       [0.719, 0.103],
       [0.697, 0.46 ],
       [0.774, 0.376],
       [0.634, 0.264],
       [0.608, 0.318],
       [0.556, 0.215],
       [0.403, 0.237],
       [0.481, 0.149],
       [0.437, 0.211]])

rbf_svc.support_vectors_

array([[0.666, 0.091],
       [0.243, 0.267],
       [0.343, 0.099],
       [0.639, 0.161],
       [0.657, 0.198],
       [0.36 , 0.37 ],
       [0.593, 0.042],
       [0.719, 0.103],
       [0.697, 0.46 ],
       [0.774, 0.376],
       [0.634, 0.264],
       [0.608, 0.318],
       [0.556, 0.215],
       [0.403, 0.237],
       [0.481, 0.149],
       [0.437, 0.211]])

SVM 在 iris 数据集上的表现¶

from sklearn import datasets
iris = datasets.load_iris()

X = iris['data']
y = iris['target'][None].T

linear_svc.fit(X, y)
rbf_svc.fit(X, y)

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/utils/validation.py:752: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/utils/validation.py:752: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

linear_svc.score(X, y), rbf_svc.score(X, y)

(0.9933333333333333, 0.9866666666666667)

与决策树进行比较¶

from sklearn import tree
clf = tree.DecisionTreeClassifier()

clf.fit(X, y)
clf.score(X, y)

1.0

SVR 的训练¶

这里使用 "密度" 作为输入, "含糖率" 为输出.

rbf_svr = svm.SVR(kernel='rbf')
rbf_svr.fit(np.array(data.iloc[:, 0])[None].T, np.array(data.iloc[:, 1])[None].T)

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/utils/validation.py:752: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1,
  gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True,
  tol=0.001, verbose=False)

rbf_svr.score(np.array(data.iloc[:, 0])[None].T, np.array(data.iloc[:, 1])[None].T)

0.035749609530696835

挺不错的.

	Density	Sugar content	Good melon
0	0.697	0.460	是
1	0.774	0.376	是
2	0.634	0.264	是
3	0.608	0.318	是
4	0.556	0.215	是
5	0.403	0.237	是
6	0.481	0.149	是
7	0.437	0.211	是
8	0.666	0.091	否
9	0.243	0.267	否
10	0.245	0.057	否
11	0.343	0.099	否
12	0.639	0.161	否
13	0.657	0.198	否
14	0.360	0.370	否
15	0.593	0.042	否
16	0.719	0.103	否

我的博客: https://yunist.cn¶

《机器学习》西瓜书 第 6 章 编程实例¶

线性核与高斯核¶

查看支持向量¶

SVM 在 iris 数据集上的表现¶

与决策树进行比较¶

SVR 的训练¶

我的博客: https://yunist.cn ¶

《机器学习》西瓜书第 6 章编程实例¶