Statistics
30
Views
0
Downloads
0
Donations
Uploader

高宏飞

Shared on 2025-12-22
Support
Share

AuthorChloe Annable

**Are you a budding programmer eager to delve into the realm of Python Machine Learning? Does the prospect of transitioning your existing programming knowledge to Python leave you perplexed?** Fear not! This comprehensive guide is tailored to address precisely those concerns and assist you in navigating through the intricacies of Python Machine Learning. In "Python Machine Learning: A Comprehensive Beginner's Guide with Scikit-Learn and Tensorflow," you will embark on a journey to unravel the mysteries of Understanding the essence of machine learning Harnessing the power of Scikit-Learn & Tensorflow Grasping the significance of the 5 V's of Big Data Delving into the world of neural networks using Scikit-Learn Exploring the intersection of machine learning and the Internet of Things (IoT) Implementing the KNN algorithm with precision Deciphering the nuances of determining the "k" parameter This book is crafted with beginners in mind, providing clear, step-by-step instructions and straightforward language, making it an ideal starting point for anyone intrigued by this captivating subject. Python, with its immense capabilities, opens up a world of possibilities, and this guide will set you on the path to harnessing its potential. Embark on your Python Machine Learning journey today by acquiring your copy of "Python Machine Learning." Explore the boundless opportunities that await and gain insights into the future of technology!

Tags
No tags
ISBN: 1230007288869
Publisher: Chloe Annable
Publish Year: 2024
Language: 英文
Pages: 301
File Format: PDF
File Size: 1.5 MB
Support Statistics
¥.00 · 0times
Text Preview (First 20 pages)
Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

(This page has no text content)
                            Python Machine Learning   A Step-by-Step Journey with Sickie-Learn and Tensor flow for Beginners   Chloe Annable
© ACopyright A2024 A- AAll Arights Areserved. The A content A contained A within A this A book A may A not A be A reproduced, A duplicated A or A transmitted A without Adirect Awritten Apermission Afrom Athe Aauthor Aor Athe Apublisher. Under Ano Acircumstances Awill Aany Ablame Aor Alegal Aresponsibility Abe Aheld A against A the A publisher, A or Aauthor, A for A any A damages, A reparation, A or A monetary A loss A due A to A the A information A contained A within A this Abook, Aeither Adirectly Aor Aindirectly. Legal ANotice: This Abook Ais Acopyright Aprotected. AIt Ais Aonly Afor Apersonal Ause. AYou Acannot Aamend, Adistribute, Asell, Ause, Aquote Aor Aparaphrase Aany Apart, Aor Athe Acontent Awithin Athis Abook, Awithout Athe Aconsent Aof Athe Aauthor Aor Apublisher. Disclaimer ANotice: Please Anote Athe Ainformation Acontained Awithin Athis Adocument Ais Afor Aeducational Aand Aentertainment Apurposes Aonly. AAll Aeffort Ahas Abeen Aexecuted Ato Apresent Aaccurate, Aup Ato Adate, Areliable, Acomplete Ainformation. ANo Awarranties Aof A any Akind A are Adeclared Aor Aimplied. AReaders A acknowledge Athat Athe A author Ais Anot Aengaged Ain Athe Arendering Aof Alegal, Afinancial, Amedical Aor Aprofessional Aadvice. AThe Acontent Awithin Athis Abook Ahas Abeen Aderived Afrom Avarious Asources. APlease Aconsult Aa Alicensed Aprofessional Abefore Aattempting Aany Atechniques Aoutlined Ain Athis Abook. By Areading Athis Adocument, Athe Areader Aagrees Athat Aunder Ano Acircumstances Ais Athe Aauthor Aresponsible Afor Aany A losses, A direct A or A indirect, A that A are A incurred A as A a A result A of A the A use A of A the A information A contained Awithin Athis Adocument, Aincluding, Abut Anot Alimited Ato, Aerrors, Aomissions, Aor Ainaccuracies.  
TABLE AOF ACONTENTS     INTRODUCTION UNSUPERVISED MACHINE LEARNING Principal Component Analysis k-means Clustering DEEP BELIEF NETWORKS Neural Networks The Restricted Boltzmann Machine Constructing Deep Belief Networks CONVOLUTIONAL NEURAL NETWORKS Understanding the Architecture Connecting the Pieces STACKED DENOISING AUTOENCODERS Autoencoders SEMI-SUPERVISED LEARNING Understanding the Techniques Self-learning Contrastive Pessimistic Likelihood Estimation TEXT FEATURE ENGINEERING Text Data Cleaning Building Features MORE FEATURE ENGINEERING
Creating Feature Sets Real-world Feature Engineering ENSEMBLE METHODS Averaging Ensembles Stacking Ensembles CONCLUSION    
INTRODUCTION   This Abook Ais Aa Astep-by-step Aguide Athrough Aintermediate Amachine Alearning Aconcepts Aand Atechniques. AYou’ll Aalso Alearn Aworking Awith Acomplex Adata, Aas Aany Amachine Alearning Atechnology Arequires Adata. AThe Abulk Aof Athe Awork Ain Athis Abook Awill Abe Acommunicated Awith Aclear Aexamples. AThis Ais Agreat Anews Aif Ayou Aare Athe Atype Athat Adoes Abetter Alearning Afrom Aexamples. Since Athis Ais Aan Aintermediate Aguide, Athere Ais Aa Alot Aof Aassumed Aknowledge Aon Aour Apart. AWe Aexpect Ayou Ato Aknow Amachine Alearning Abasics Aand APython. AThe Aact Aof Apublishing Aa Abook Alike Athis Ais Aalways Aabout Asimplifying Athings Aso Aanyone Acan Alearn. ASo, Aif Ayou Aaren’t Asure Ayou Agot Athe Abasics Adown, Ayou Acan Astill Ahave Aa Alook Aand Ado Asome Aextra Aresearch Awhen Ayou Acome A across Aconcepts Athat Aare Anew Ato Ayou. AThe Ainformation Ashould Aotherwise Abe Aeasy Ato Adigest. Let’s Anow Atalk Aabout Awhat Ayou Awill Alearn: We Awill Ause Aunsupervised Amachine Alearning Aalgorithms Aand Atools Afor Aanalyzing Acomplex Adatasets. AThat Ameans Ayou Awill Alearn Aabout Aprincipal Acomponent A analysis, A k-means A clustering A and A more. A If A this A sounds A strange Aand Anew, Athat Ais Aokay; Ait’s Awhy Awe Aare Ahere. AYou Adon’t Ahave Ato Aknow Awhat Aany Aof Athis Ameans Aat Athis Apoint. AAgain, Aall Athis Awill Abe Aaccompanied Awith Apractical Aexamples. Then A we A will A learn A about A restricted A Boltzmann A machine A algorithms A and Adeep Abelief Anetworks. AThese Awill Abe Afollowed Aby Aconvolutional Aneural Anetworks, Aautoencoders, Afeature Aengineering Aand Aensemble Atechniques. AEach Achapter Awill Abegin Aby Aexplaining, Ain Ageneral Aterms, Athe Atheory Abehind Athese Atechniques.
As Aa Ageneral Aoverarching Arule, Apractice Athe Aconcepts Ain Athis Abook. AThat Ais Ahow Ayou Awill Abenefit Athe Amost Afrom Athe Alessons Ain Athis Abook. AYou Amight Afind Asome Aparts Achallenging. ADon’t Ajust Asteam Aahead. ATry Ato Afind Aextra Amaterial Athat A will A help A you A understand A the A concept, A or A go A over A the A material A again. Only Abegin Athe Apracticals Awhen Ayou Aare Asomewhat Aconfident Ain Ayour Aunderstanding. AThis Ais Aimportant Abecause, Aif Ayou Adon’t Ado Athe Awork, Ayou Awon’t Aunderstand Athe Amore Aadvanced Aconcepts. Each Achapter Awill Abe Astructured Ato Ainclude Atheory, Atools Aand Aexamples Aof Areal-world Aapplications.        
CHAPT ER A1:   UNSUPERVISED AMACHINE ALEARNING   Unsupervised Amachine Alearning Ais Amade Aup Aof Aa Aset Aof Atechniques Aand Atools Acrucial Ato Aexploratory Aanalysis. AUnderstanding Athese Atools Aand Atechniques Ais Aimportant Ato Aextracting Avaluable Adata Afrom Acomplex Adatasets. AThese Atools Ahelp Areveal Apatterns Aand Astructures Ain Adata Awhich Aare Ahard Ato Adiscern Aotherwise. That Ais Awhat Awe Awill Ado Ain Athis Achapter. AWe Awill Abegin Awith Aa Asolid Adata Amanipulation Atechnique Acalled Aprincipal Acomponent Aanalysis. AThen Awe Awill Aquickly Alook Aat Ak-means Aclustering Aand Aself-
organizing Amaps. AWe Awill Athen Alearn Ahow Ato Ause Athese Atechniques Ausing AUCI AHandwritten ADigits Adatasets. ALet’s Aget Ato Ait.
PRINCIPAL ACOMPONENT AANALYSIS PCA Ais Aarguably Athe Amost Apopular Alinear Adimensionality Areduction Amethod Aused Ain Abig Adata Aanalytics. AIts Aaim Ais Ato Areduce Athe Adimensionality Aof Adata Aso Ait Abecomes Aeasy Ato Amanage. PCA Ais Aa Adecomposition Amethod Athat Ais Agood Aat Asplitting Aa Amultivariate Adataset Ainto Aorthogonal Acomponents. AThose Aelements Awill Abecome Athe Asummary Aof Athe Adata Asets, Aallowing Afor Ainsights. It Adoes Athis Ain Aa Afew Asteps: ABy Aidentifying Athe Adataset’s Acenter Apoint, Acalculating Athe Acovariance Amatrix Aand Athe Aeigenvectors Aof Athe Amatrix. AThen Ait Aortho-normalizes Athe Aeigenvectors Aand Acalculates Athe Aproportion Aof Avariance Ain Athe Aeigenvectors. ASince Ayou Ahave Alikely Anever Aheard Aany Aof Athese Aterms, Ait Ais Aworth Agoing Ain Aand Aexplaining Athem Afurther. 1. Covariance A: AThis Ais Aa Avariance Abetween Atwo Aor Amore Avariables, Aapplying Ato Amultiple Adimensions. ASay Awe Ahave Aa Acovariance Abetween Atwo Avariables; Awe'd Ause Aa A2 Ax A2 Amatrix Ato Adescribe Ait. AIf Athere A are A 3 A variables, A we’ll A need A a A 3 A x A 3 A matrix, A and A on A it A goes. AThe Afirst Aphase Aof Aany APCA Acalculation Ais Athe Acovariance Amatrix. 2. Eigenvector A: AThis Avector Adoesn’t Achange Adirection Awhen Aa Alinear Atransformation Ais Arealized. ALet’s Aillustrate Athis. AImagine Aholding Aan Aelastic Arubber Aband Abetween Ayour Ahands. AThen Ayou Astretch Athe Arubber Aband. AThe Aeigenvector Awould Abe Athe Apoint Ain Athe Aband Athat Adid Anot Amove Awhen Ayou Awere Astretching Ait. AIt Ais Athe Apoint Ain Athe Amiddle A that A stays A at A the A same A place A before A and A after A you A stretch Athe Aband. 3. Orthogonalization A: AThe Aterm Ameans Atwo Avectors Athat Aare Aat Aright Aangles Ato Aeach Aother. ASimply Areferred Ato Aas Aorthogonal. 4. Eigenvalue A: AThe Aeigenvalue Acalculates Athe Aproportion Aof Avariance Arepresented Aby Athe Aeigenvectors. AThe Aeigenvalue Acorresponds, Amore Aor Aless, Ato Athe Alength Aof Athe Aeigenvector.
Here’s Aa Ashort Asummary: ACovariance Ais Aused Ato Acalculate Aeigenvectors, Aand Athen Aortho-normalization Atakes Aplace. AThis Aprocess Adescribes Ahow Aprincipal component Aanalysis Atransforms Acomplex Adata Asets Ainto Alow Adimensional Aones.   Applying APCA   Now A let’s A see A how A the A algorithm A works A in A action. A As A we’ve A said, A we A will Ause Athe AUCI Ahandwritten Adigits Adataset. AYou Acan Aimport Ait Ausing AScikit-learn Abecause Ait Ais Aan Aopen-source Adataset. AThe Adataset Ahas Aabout A1800 Ainstances Aof Ahandwritten Adigits Afrom Aabout A50 Awriters. AThe Ainput Ais Acomprised Aof Apressure Aand Alocation Aand Aresampled Aon Aan A8 Ax A8 Agrid. AThis Ais Ato Ayield Amaps Athat Acan Abe Achanged Ato A64-feature Avectors. AThese Avectors Awill Abe Aused Afor Aanalysis. AWe Ause APCA Aon Athem Abecause Awe Aneed Ato Areduce Atheir Anumber, Amaking Athem Amore Amanageable. AHere Ais Ahow Athe Acode Alooks: import Anumpy Aas Anp from Asklearn.datasets Aimport Aload_digits Aimport Amatplotlib.pyplot Aas Aplt from Asklearn.decomposition Aimport APCA Afrom Asklearn.preprocessing Aimport Ascale Afrom Asklearn.lda Aimport ALDA import Amatplotlib.cm Aas Acm Adigits A= Aload_digits() data A= Adigits.data
n_samples, An_features A= Adata.shape An_digits A= Alen(np.unique(digits.target)) Alabels A= Adigits.target Let’s Atalk Aabout Awhat Athe Acode Adoes: 1. The A first A thing A we A do A is A import A the A libraries A we A will A need, Acomponents Aand Athe Adatasets. 2. We Aretrieve Athe Adata Aand Athen Amake Aa Adata Avariable Athat Awill Astore Aa Anumber Aof Adigits. AThe Atarget Avector Ais Asaved Aas Aa Alabel. Now Awe Acan Abegin Aapplying Athe APCA Aalgorithm: pca A= APCA(n_components=10) Adata_r A= Apca.fit(data).transform(data) print('explained Avariance Aratio A(first Atwo Acomponents): %s' A% Astr(pca.explained_variance_ratio_)) print('sum Aof Aexplained Avariance A(first Atwo Acomponents): %s' A% Astr(sum(pca.explained_variance_ratio_))) The Acode Awill Agive Aus Aa Avariance Awhich Awill Abe Aexplained Aby A all Acomponents. AThey Awill Abe Aordered Aby Atheir Aexplanatory Apower. Our Aresult Ais Aa Avariance Aof A0.589. AWe’ve Acut Adown Afrom A64 Avariables Ato A10 Acomponents. AThat’s Aa Abig Aimprovement. APCA Awill Aresult Ain Asome Ainformation Abeing Alost, Abut Awhen Ayou Aweigh Athe Adisadvantages Aagainst Aadvantages, A advantages A win A out. A Let’s A illustrate A with A visualizations. A We Ahave Athe A“data_r” Aproject, Awhich Acontains Athe Aoutput. AWe Awill Aadd Athe A“color” Avector Aso Aall Acomponents Astand Aout Afrom Athe Ascatter Aplot. AUse Athe Afollowing Acode Ato Aget Ait: X A= Anp.arange(10)
ys A= A[i+x+(i*x)**2 Afor Ai Ain Arange(10)] Aplt.figure() colors A= Acm.rainbow(np.linspace(0, A1, Alen(ys))) for Ac, Ai Atarget_name Ain Azip(colors, A[1,2,3,4,5,6,7,8,9,10], Alabels): Aplt.scatter(data_r[labels A== AI, A0], Adata_r[labels A== AI, A1], c=c, Aalpha A= A0.4) Aplt.legend() plt.title('Scatterplot Aof APoints) Aplt.show() What Aconclusion Acan Awe Adraw Afrom Athis? AAs Ayou Acan Asee Ain Athe Ascatterplot, you A are A able A to A pinpoint A a A class A separation A of A the A first A 2 A components. A That Atells Aus Ait Awill Abe Adifficult Ato Amake Aaccurate Aclassifications Ausing Athe Adataset. Despite Athat, Ayou Acan Asee Athat Aclasses Aare Aclustered Ain Aa Away Athat Aallows Aus Ato Aget Asome Aaccurate Aresults Athrough Aclustering Aanalysis. APCA Ahas Agiven Aus Aa Ahint Aabout Athe Astructure Aof Athe Adataset, Aand Awe Acan Aprobe Ait Afurther Ausing Aother Amethods. ALet’s Aperform Athat Aanalysis Athrough Ak-means Aclustering Aalgorithms.  
K-MEANS ACLUSTERING We’ve Asaid Aunsupervised Amachine Alearning Aalgorithms Aare Agreat Afor Agleaning Ainformation Afrom Avery Acomplex Adatasets. AThese Aalgorithms Aare Aa Ahuge Atime- Asaver Afor Adata Aanalysts Awho Aare Atrying Ato Aextract Adata Afrom Aa Acomplicated Adataset. ANow, Alet’s Atake Athat Aa Astep Afurther Aand Alook Aat Aclustering Aalgorithms. Clustering Ais Amaybe Athe Acore Aunsupervised Amachine Alearning Amethod Abecause Ait Afocuses Aon Aoptimization Aand Aefficient Aimplementation. AThis Aalgorithm Ais Aridiculously Afast. AThe Amost Apopular Aclustering Atechnique Ais A“k- Ameans.” Ak-means Abuilds Aclusters Aby Aarbitrarily Ainitiating Athem Aas Ak-many Apoints. AEach Apoint Ain Athe Adata Afunctions Aas Aa Amean Aof Acluster. AThe Amean Ais Adetermined Abased Aon Athe Anearest Amean Ain Athe Acluster. AEach Acluster Awill Ahave Aa Acenter; Athat Acenter Abecomes Athe Anew Amean, Amaking Aall Aother Ameans Achange Atheir Aposition. After Aa Anumber Aof Aiterations, Athe Acluster’s Acenter Awill Amove Ainto Aa Aposition Athat Aminimizes Athe Aperformance Ametric. AThe Aalgorithm Ahas Aa Asolution Awhen Athat Ahappens. AIt Aalso Ameans Aobservations Aare Ano Alonger Abeing Aassigned. ALet's Alook Aat Ak-means Ain Acode, Aand Alet’s Acompare Ait Awith Athe Aprincipal Acomponent Aanalysis. from Atime Aimport Atime Aimport Anumpy Aas Anp import Amatplotlib.pyplot Aas Aplt Anp.random.seed() digits A= Aload_digits() Adata A=
Ascale(digits.data) n_samples, An_features A= Adata.shape n_digits A= Alen(np.unique(digits.target)) Alabels A= Adigits.target sample_size A= A300 print("n_digits: A%d, A\t An_samples A%d, A\t An_features A%d" % A(n_digits, An_samples, An_features)) Aprint(79 A* A'_') print('% A9s' A% A'init''      time      inertia      homo      compl      v-meas A RI      AMI Asilhouette') def Abench_k_means(estimator, Aname, Adata): t0 A= Atime() Aestimator.fit(dat a) print('% A9s A%.2fs A%i A%.3f A%.3f A%.3f A%.3f A%.3f A%.3f' % A(name, A(time() A- At0), Aestimator.inertia_, Ametrics.homogeneity_score(labels, Aestimator.labels_), Ametrics.completeness_score(labels, Aestimator.labels_), Ametrics.v_measure_score(labels, Aestimator.labels_), Ametrics.adjusted_rand_score(labels, Aestimator.labels_), Ametrics.silhouette_score(data, Aestimator.labels_, metric='euclidean', Asample_size=sample_size) )) So, Ahow Adoes APCA Acode Aand Ak-means Acode Adiffer? AThe Amain Adifference Ais Athat Awe Astart Aby Ascaling Athe Avalues Awithin Athe Adataset.
AWhy? ABecause Aif Awe Adon’t, Awe Amight Ahave Avarious Adisproportionate Afeature Avalues Athat Acan Ahave Aunpredictable Aside-effects Aon Athe Aentire Adataset. Clustering Aalgorithms Alike Athese Aare Aonly Asuccessful Awhen Athey Acan Ainterpret Athe Aways Athe Adata Ais Agrouped. ALet’s Alook Aat Athe Aperformance Ameasures Awe Ahave Ain Athe Acode Aso Awe Acan Abetter Aunderstand Aclustering: 1. Homogeneity Ascore A: AA Acluster Acontaining Ameasurements Aof Aa Asingle Aclass. AIt Ahas Aa Abasic Ascore Aof Azero Ato Aone. AValues Acloser Ato Azero Atell Athat Athe Asample Ahas Alow Ahomogeneity, Awhile Athose Aat Athe Aother Aend Aof Athe Aspectrum Atell Athe Asample Ais Afrom Aa Asingle Aclass. 2. Completeness Ascore A: AThis Ascore Asupports Athe Ahomogeneity Ascore Aby Agiving Aus Ainformation Aon Athe Aassignments Aof Ameasurements Aalong Athe Asame Aclass. APut Atogether Awith Aa Ahomogeneity Ascore, Awe Awill Abe Aable Ato Atell Aif Awe Ahave Aa Aperfect Aclustering Asolution. 3. V A- Ameasure A: AThe Aharmonic Amean Aof Athe Ahomogeneity Ascore Aand Athe Acompleteness Ascore. AIt Ahas Aa Ascaled Ameasure Aof Azero Ato Aone, Awhich Aassesses Athe Ahomogeneity Aand Acompleteness Aof Athe Acluster. 4. Adjusted ARand AIndex Ascore A: AMeasures Asimilarity Ain Athe Alabeling Aon Aa Azero Ato Aone Ascale. AApplied Ato Aclustering, Athis Ameasures Athe Aharmony Aof Athe Aassignment Asets. 5. Silhouette Ascore A: AMeasures Athe Aperformance Aof Athe Aclustering Awithout Ausing Athe A clustering Aof Alabeled Adata. AThe Ascore Ais Aon Aa Ascale Aof A-1 Ato A1. AIt Atells Aus Aif Athe Aclusters Aare A well-defined. AIncorrect Aclustering Awill Aequal A-1, A1 Ais Ahighly Adefined, Aand Aif Athe Ascore Agravitates Atowards A0, Ait Atells Aus Athere Ais Asome Aoverlap Abetween Aclusters. In Athe Ak-means Aclustering Aexample, Aall Athese Ascores Agive Aus Athe Aclustering Ameasurements. ALet’s Ause Athe A“bench_k_means” Afunction Ato Aanalyze Athe Aresults. AThe Afollowing Acode Ashould Ado Ait: bench_k_means(KMeans(init='k-means++', An_clusters=n_digits, An_ Ainit=10), name="k-means++", Adata=data) Aprint(79 A* A'_') AHere’s Ahow Athe Aresults Ashould Alook: n_digits: A10, A A A A A A A A A A n_samples A1797,      n_features A64
  init time inertia homo compl k-means++ 0.25s 69517 0.596 0.643 init v-meas ARI AMI silhouette k-means++ 0.619 0.465 0.592 0.123 Let’s Adiscuss Athese Ascores Abriefly. AFirstly, AI Ashould Amention Athe Adataset Ahas Aa Alot Aof Anoise; Athe Alow Asilhouette Ascore Atells Aus Athis. ASecondly, Awe A can Aconclude Athrough Athe Ahomogeneity Ascore Athat Athe Acluster Acenters Aaren’t Awell Aresolved. AThe Av-measure Aand AARI Aare Aalso Anot Avery Aimpressive. We Acan Asay Athat Athese Aresults Aneed Ato Abe Aimproved. AWe Acan Aget Abetter Aresults Aby Aapplying APCA Aon Atop Aof Athe Ak-means Aclustering. AWhen Awe Areduce Athe Adataset’s Adimensionality, Awe Ashould Aget Abetter Aresults. ALet’s Aapply Athe APCA Aalgorithm Ato Asee Aif Ait Aworks: pca A= APCA(n_components=n_digits).fit(data) Abench_k_means(KMeans(init=pca.components_, An_clusters=10), Aname="PCA-based", Adata=data) We A applied A PCA A to A the A data A set A so A we A can A have A the A same A number A of Aprincipal Acomponents Aas Aclasses. ASee Aif Athe Aresults Ahave Aimproved Abelow:   n_digits: A10, init      time n_samples A1797, inertia      homo n_features A64 compl k-means++ A 0.2s 71820 0.673 0.715 init      v-meas ARI silhouette  
k-means++ A 0.693 0.567 0.121   These Aresults Aaren’t Athat Agreat Aeither Abecause Athe Asilhouette Ascore Ais Astill Alow. ABut Aas Ayou Acan Asee, Athe Aother Ascores Ahave Agreatly Aimproved, Aespecially AV- Ameasure Aand AARI. AThis Ashows Ahow Aapplying APCA Acan Aimprove Aour Aanalysis.   Fine-tuning   What Awe Ahave Adone Aso Afar Ais Ause Ak-means Aclustering Ato A understand Aclustering A analysis A and A performance A metrics. A Our A next A step A is A fine-tuning Aour Aresults Aconfigurations Aso Awe Acan Aget Abetter Aresults. AIn Athe Areal Aworld, Ayou Awill Ado Athis Aa Alot. ALet’s Asee Ahow Ait Ais Adone Aby Amodifying Athe Ak-value. You Awill Abe Atempted Ato Amodify Athe Ak-value Arandomly Aand Asee Awhich Aone Aprovides Athe Abest Aresults. AThat Awon’t Atell Ayou Awhich Avalue Aworks Abest. AThe Aproblem Ais Athat, Awhen Ayou Aincrease Athe Avalue, Athere’s Aa Apossibility Ayou Awill Aend Aup Alowering Athe Asilhouette Ascore Awithout Agetting Aany Aclusters. ASuppose Athe Ak-value Ais Aset Ato A0; Athe Azero Awill Abe Athe Avalue Aof Aobservation Awithin Athe sample. AEach Apoint Awill Ahave Aa Acluster, Aand Athe Asilhouette Ascore Awill Abe Alow. ASo, Ayou Awon’t Again Aany Auseful Aresults. To Asolve Athis, Awe Acan Ause Athe A“elbow Amethod.” AThe A“elbow Amethod” Ais Aa Asimple Abut Aeffective Atechnique. AThe Amethod Aselects Aoptimal Ak-clusters. AUsually, Awhen Athe Ak-value Ais Aincreased, Aimprovement Ain Adistortion Ais Adecreased. AThe Aelbow Amethod Aallows Aus Ato Afind Athe Aperfect Aarea, Aknown Aas Athe Aelbow Apoint, Awhere Aimprovement Ain Adistortion Ais Athe Aabsolute Alowest. AThis Ais Athe Apoint Awhere Awe Astop Adividing Aout Adata Ainto Amultiple Aclusters. AThe Aelbow
Apoint Ais Acalled Aso Abecause Ait Aresembles Aa Abent Aarm, Awith Athe Aelbow Abeing Athe Amost Aoptimal Apoint. Below Ais Aan Aexample Aof Ahow Awe Acan Ause Ait. AIn Athe Aexample Abelow, Awe Ause Athe Aelbow Amethod Aafter Awe Ahave Aapplied APCA. AThe Aorder Ais Aimportant Abecause Aof APCA Adimensionality Areduction. import Anumpy Aas Anp from Asklearn.cluster Aimport AKMeans Afrom Asklearn.datasets Aimport Aload_digits Afrom Ascipy.spatial.distance Aimport Acdist Aimport Amatplotlib.pyplot Aas Aplt from Asklearn.decomposition Aimport APCA Afrom Asklearn.preprocessing Aimport Ascale Adigits A= Aload_digits() data A= Ascale(digits.data) An_samples, An_features A= Adata.shape n_digits A= Alen(np.unique(digits.target)) Alabels A= Adigits.target K A= Arange(1,20) Aexplainedvariance= A[] Afor Ak Ain AK: reduced_data A= APCA(n_components=2).fit_transform(data) kmeans A= AKMeans(init A= A'k-means++', An_clusters A= Ak, An_init A= Ak) kmeans.fit(reduced_data) Aexplainedvariance.append(sum(np.min(cdist(reduced_dat a,
kmeans.cluster_centers_, A'euclidean'), Aaxis A= A1))/data.shape[0]) plt.plot(K, Ameandistortions, A'bx-') Aplt.show() It’s Aworth Apointing Aout Athat Aelbow Apoints Aaren’t Aeasy Ato Aspot. AThe Adataset Awe Aare Ausing Awill Aproduce Aless Apronounced Aand Amore Agradual A progression, Awhich Ais Acaused Aby Athe Aoverlap Aof Aclasses. Visual A plot A verification A is A easy A to A perform A and A interpret, A but A it A has A the Adisadvantage Aof Abeing Aa Amanual Avalidation Atechnique. AWhat Awe Aneed Ais Asomething Aautomated Ainstead Aof Amanual. AIn Athat Aregard Awe Aare Ain Aluck Abecause Athere's Aa Acode-based Atechnique Acalled Across-validation, Aknown Aas Av- Afold, Awhich Acan Ahelp Aus.   Cross-validation ATechnique   Cross-validation Ainvolves Asplitting Athe Adataset Ainto Asegments. AWe Ahave Aput Athe Atest Aset Aaside Aand Afocused Aon Atraining Athe Amodel Aon Atraining Adata. ALet’s Asee Awhat Ahappens Awhen Awe Ause Ait Awith Athe A“digits” Adataset. import Anumpy Aas Anp from Asklearn Aimport Across_validation Afrom Asklearn.cluster Aimport AKMeans Afrom Asklearn.datasets Aimport Aload_digits Afrom Asklearn.preprocessing Aimport Ascale Adigits A= Aload_digits() data A= Ascale(digits.data) An_samples, An_features A=
The above is a preview of the first 20 pages. Register to read the complete e-book.