菴懆��シ哢atasha Latysheva縲�
郛冶セ第ウィ�哢atasha 豢サ霍�コ� 蜑第。・郛也ィ句ュヲ髯「�瑚ッ・蟄ヲ髯「蟆�コ� 2016 蟷エ 2 譛� 20-21 譌・荳セ蜉� Python 謨ー謐ョ遘大ュヲ隶ュ扈�是�悟惠霑咎㈹菴�蜿ッ莉・蟄ヲ荵�髓亥ッケ邇ー螳樔ク也阜髣ョ鬚倡噪譛�蜈郁ソ帶惻蝎ィ蟄ヲ荵�謚�譛ッ縲�
蝨ィ譛コ蝎ィ蟄ヲ荵�荳ュ�御ス�蜿ッ閭ス扈丞クク蟶梧悍譫�サコ蛻�アサ蝎ィ�悟ー�コ狗黄蛻�アサ蛻ー譟蝉コ帷アサ蛻ォ荳ュ�瑚ソ吩コ帷アサ蛻ォ譏ッ蝓コ莠惹ク�扈�嶌蜈ウ蛟シ縲ゆセ句ヲゑシ悟庄莉・譬ケ謐ョ譚・閾ェ蜈亥燕謔」閠�噪謨ー謐ョ荳コ謔」閠�署萓幄ッ頑妙縲ょ�邀サ蜿ッ閭ス豸牙所蝨ィ邀サ蛻ォ荵矩龍譫���鬮伜コヲ髱樒コソ諤ァ逧�セケ逡鯉シ悟ヲゆク句崟謇�遉コ逧�コ「濶イ縲∫サソ濶イ蜥瑚統濶イ邀サ蛻ォ 荳矩擇��
![kNN 霎ケ逡珪(../Images/4322b5f9ae66be35375b65bec3b6c05f.png)
隶ク螟夂ョ玲ウ募キイ扈剰「ォ蠑�蜿醍畑莠手�蜉ィ蛻�アサ�悟クク隗∫噪邂玲ウ募桁諡ャ髫乗惻譽ョ譫励�∵髪謖∝髄驥乗惻縲∵惷邏�雍晏掌譁ッ蛻�アサ蝎ィ蜥悟、夂ァ咲アサ蝙狗噪逾樒サ冗ス醍サ懊�ゆクコ莠�コ�ァ」蛻�アサ逧�キ・菴懷次逅�シ梧�莉ャ莉・荳�荳ェ邂�蜊慕噪蛻�アサ邂玲ウ補�披�婆-譛�霑鷹そ��NN�我クコ萓具シ悟ケカ蝨ィ Python 2 荳ュ莉主、エ蠑�蟋区桷蟒コ螳��ょヲよ棡菴�蛻壼�蠑�蟋句ュヲ荵� Python�悟庄莉・菴ソ逕ィ荳サ隕∫噪 蜻ス莉、蠑冗シ也ィ矩」取�シ�瑚�御ク肴弍菴ソ逕ィ lambda 蜃ス謨ー 蜥� [蛻苓。ィ謗ィ蟇シ蠑従(http://www.secnetix.de/olli/Python/list_comprehensions.hawk) 逧�」ー譏主シ�/蜃ス謨ー蠑城」取�シ�御サ・菫晄戟邂�蜊輔�ょ惠霑咎㈹�梧�莉ャ蟆�サ狗サ榊錘荳�遘肴婿豕輔�LNN 騾夊ソ�ー�眠逧�ョ樔セ倶ク取怙逶ク莨シ逧�。井セ句�扈�擂霑幄。悟�邀サ縲ょ惠霑咎㈹�御ス�蟆�スソ逕ィ kNN 螟�炊豬∬。檎噪�亥ース邂。譏ッ逅�Φ蛹也噪�蛾ク「蟆セ闃ア謨ー謐ョ髮�シ瑚ッ・謨ー謐ョ髮�桁諡ャ荳臥ァ埼ク「蟆セ闃ア逧�干蜊画オ矩㍼謨ー謐ョ縲よ�莉ャ逧�ササ蜉。譏ッ譬ケ謐ョ闃ア蜊画オ矩㍼謨ー謐ョ鬚�オ玖干蜊臥噪迚ゥ遘肴��ュセ縲ら罰莠惹ス�蟆�渕莠惹ク�扈�キイ遏・逧�ュ」遑ョ蛻�アサ譚・譫�サコ鬚�オ句勣�悟屏豁、 kNN 譏ッ荳�遘咲尅逹」蠑乗惻蝎ィ蟄ヲ荵��郁區辟カ譛我コ帑サ、莠コ蝗ー諠醍噪譏ッ�悟惠 kNN 荳ュ豐。譛画仞蠑冗噪隶ュ扈�亳谿オ�幄ッキ蜿りァ� 諛呈Σ蟄ヲ荵��峨�LNN 莉サ蜉。蜿ッ莉・蛻�ァ」荳コ郛門� 3 荳ェ荳サ隕∝粥閭ス��
1. 隶。邂嶺ササ菴穂ク、轤ケ荵矩龍逧�キ晉ヲサ
2. 蝓コ莠手ソ吩コ帶�蟇ケ霍晉ヲサ謇セ蛻ー譛�霑鷹そ
3. 蝓コ莠取怙霑鷹そ蛻苓。ィ蟇ケ邀サ蛻ォ譬�ュセ霑幄。悟、壽焚謚慕・ィ
莉・荳句崟荳ュ逧�ュ・鬪、謠蝉セ帑コ�ス�蝨ィ莉」遐∽クュ髴�隕∝ョ梧�莉サ蜉。逧�ォ伜アよャ。讎りソー縲�
邂玲ウ�
邂�閠瑚ィ�荵具シ御ス�蟆�桷蟒コ荳�荳ェ閼壽悽�悟ッケ莠取ッ丈クェ髴�隕∝�邀サ逧�セ灘��梧頗邏「謨エ荳ェ隶ュ扈�寔荳ュ逧� k 荳ェ譛�逶ク莨シ逧�ョ樔セ九�ら┯蜷趣シ碁�夊ソ�、壽焚謚慕・ィ諤サ扈捺怙逶ク莨シ螳樔セ狗噪邀サ蛻ォ譬�ュセ�悟ケカ蟆��菴應クコ豬玖ッ墓。井セ狗噪鬚�オ狗サ捺棡霑泌屓縲�
螳梧紛逧�サ」遐∝惠譁�ォ�逧�怙蜷弱�ら鴫蝨ィ�瑚ョゥ謌台サャ蛻�悪譟・逵倶ク榊酔驛ィ蛻�ケカ隗」驥雁ョ�サャ逧�粥閭ス縲�
蜉�霓ス謨ー謐ョ蟷カ諡��荳コ隶ュ扈�寔蜥梧オ玖ッ暮寔縲ゆクコ莠�ソォ騾滉ク頑焔�御ス�蟆�スソ逕ィ荳�莠幄セ�勧蜃ス謨ー�夊區辟カ謌台サャ蜿ッ莉・閾ェ蟾ア荳玖スス鮑「蟆セ闃ア謨ー謐ョ蟷カ菴ソ逕ィcsv.reader蜉�霓ス螳�シ御ス�荵溷庄莉・逶エ謗・莉� scikit-learn 蠢ォ騾溯執蜿夜ク「蟆セ闃ア謨ー謐ョ縲よュ、螟厄シ御ス�蜿ッ莉・菴ソ逕ィ train_test_split 蜃ス謨ー霑幄。� 60/40 逧�ョュ扈�/豬玖ッ墓究蛻�シ御ス�ス�荵溷庄莉・閾ェ蟾ア髫乗惻蛻��陦鯉シ郁ッキ蜿りァ∵ュ、邀サ蝙狗噪螳樒鴫�峨�ょ惠譛コ蝎ィ蟄ヲ荵�荳ュ�瑚ョュ扈�/豬玖ッ墓究蛻�畑莠主㍼蟆題ソ�供蜷遺�披�泌惠螳梧紛謨ー謐ョ髮�ク願ョュ扈�ィ。蝙句セ�蠕�莨壼ッシ閾エ讓。蝙玖ソ�供蜷域焚謐ョ逧�飭螢ー蜥檎音諤ァ�瑚�御ク肴弍螳樣刔逧�コ募アりカ句漢縲ゆス�蜿ェ蝨ィ隶ュ扈�寔荳願ソ幄。御ササ菴慕アサ蝙狗噪讓。蝙玖ー�紛�井セ句ヲゑシ碁�画叫驍サ螻�噪謨ー驥� k�俄�披�疲オ玖ッ暮寔菴應クコ荳�荳ェ迢ャ遶狗噪縲∵悴隗ヲ蜿顔噪謨ー謐ョ髮�シ檎畑莠取オ玖ッ墓怙扈域ィ。蝙狗噪諤ァ閭ス縲�
from sklearn.datasets import load_iris
from sklearn import cross_validation
import numpy as np
# load dataset and partition in training and testing sets
iris = load_iris()
X_train, X_test, y_train, y_test = cross_validation.train_test_split(iris.data, iris.target, test_size=0.4, random_state=1)
# reformat train/test datasets for convenience
train = np.array(zip(X_train,y_train))
test = np.array(zip(X_test, y_test))
霑呎弍鮑「蟆セ闃ア謨ー謐ョ髮��∵焚謐ョ諡��莉・蜿顔エ「蠑慕噪邂�隕∵欠蜊励��
-
菴ソ逕ィ ChatGPT 逧� GPTs 蛻帛サコ菴�閾ェ蟾ア逧� GPT �ら炊諠ウ諠��荳具シ悟庄莉・騾夊ソ�衍逵句頭荳ェ蛟シ莠ァ逕滓怙蜃�。ョ逧�「�オ区擂莨伜喧k�亥盾隗� [莠、蜿蛾ェ瑚ッ‐(http://scikit-learn.org/stable/modules/cross_validation.html)�峨��
蟇ケkNN逧�ク�荳ェ蠕亥・ス逧�ヲりソー蜿ッ莉・蝨ィ 霑咎㈹ 髦�ッサ縲ゆク�荳ェ譖エ豺ア蜈・逧�ョ樒鴫�悟桁諡ャ蜉�譚�柱謳懃エ「譬托シ瑚ァ� 霑咎㈹縲�
螳梧紛閼壽悽
螳梧紛逧��譛ャ螯ゆク具シ�
from sklearn.datasets import load_iris
from sklearn import cross_validation
from sklearn.metrics import classification_report, accuracy_score
from operator import itemgetter
import numpy as np
import math
from collections import Counter
# 1) given two data points, calculate the euclidean distance between them
def get_distance(data1, data2):
points = zip(data1, data2)
diffs_squared_distance = [pow(a - b, 2) for (a, b) in points]
return math.sqrt(sum(diffs_squared_distance))
# 2) given a training set and a test instance, use getDistance to calculate all pairwise distances
def get_neighbours(training_set, test_instance, k):
distances = [_get_tuple_distance(training_instance, test_instance) for training_instance in training_set]
# index 1 is the calculated distance between training_instance and test_instance
sorted_distances = sorted(distances, key=itemgetter(1))
# extract only training instances
sorted_training_instances = [tuple[0] for tuple in sorted_distances]
# select first k elements
return sorted_training_instances[:k]
def _get_tuple_distance(training_instance, test_instance):
return (training_instance, get_distance(test_instance, training_instance[0]))
# 3) given an array of nearest neighbours for a test case, tally up their classes to vote on test case class
def get_majority_vote(neighbours):
# index 1 is the class
classes = [neighbour[1] for neighbour in neighbours]
count = Counter(classes)
return count.most_common()[0][0]
# setting up main executable method
def main():
# load the data and create the training and test sets
# random_state = 1 is just a seed to permit reproducibility of the train/test split
iris = load_iris()
X_train, X_test, y_train, y_test = cross_validation.train_test_split(iris.data, iris.target, test_size=0.4, random_state=1)
# reformat train/test datasets for convenience
train = np.array(zip(X_train,y_train))
test = np.array(zip(X_test, y_test))
# generate predictions
predictions = []
# let's arbitrarily set k equal to 5, meaning that to predict the class of new instances,
k = 5
# for each instance in the test set, get nearest neighbours and majority vote on predicted class
for x in range(len(X_test)):
print 'Classifying test instance number ' + str(x) + ":",
neighbours = get_neighbours(training_set=train, test_instance=test[x][0], k=5)
majority_vote = get_majority_vote(neighbours)
predictions.append(majority_vote)
print 'Predicted label=' + str(majority_vote) + ', Actual label=' + str(test[x][1])
# summarize performance of the classification
print '\nThe overall accuracy of the model is: ' + str(accuracy_score(y_test, predictions)) + "\n"
report = classification_report(y_test, predictions, target_names = iris.target_names)
print 'A detailed classification report: \n\n' + report
if __name__ == "__main__":
main()
諠ウ莠�ァ」譖エ螟夲シ滓衍逵区�莉ャ逧�ク、螟ゥ謨ー謐ョ遘大ュヲ隶ュ扈�是��
https://cambridgecoding.com/datascience-bootcamp
邂�莉具シ喙Natasha Latysheva](http://blog.cambridgecoding.com/author/natlat/) 譏ッMRC蛻�ュ千函迚ゥ蟄ヲ螳樣ェ悟ョ、逧�ョ。邂礼函迚ゥ蟄ヲ蜊壼」ォ逕溘�ょ・ケ逧��皮ゥカ髮�クュ莠守剏逞�渕蝗�扈�ュヲ縲∫サ溯ョ。鄂醍サ懷�譫仙柱陋狗區雍ィ扈捺桷縲よ峩蟷ソ豕帛慍隸エ�悟・ケ逧��皮ゥカ蜈エ雜」蛹�峡謨ー謐ョ蟇�寔蝙句�蟄千函迚ゥ蟄ヲ縲∵惻蝎ィ蟄ヲ荵��育音蛻ォ譏ッ豺ア蠎ヲ蟄ヲ荵��牙柱謨ー謐ョ遘大ュヲ縲�
蜴滓枚縲らサ丞�隶ク霓ャ霓ス縲�
逶ク蜈ウ��
1. Google 鄂醍サ懷ョ牙�隸∽ケヲ - 蠢ォ騾溷�髣ィ鄂醍サ懷ョ牙�閨御ク壹��
2. Google 謨ー謐ョ蛻�梵荳謎ク夊ッ∽ケヲ - 謠仙合菴�逧�焚謐ョ蛻�梵謚�閭ス
3. Google IT 謾ッ謖∽ク謎ク夊ッ∽ケヲ - 謾ッ謖∽ス�謇�蝨ィ扈�サ�噪 IT