machine_learning_for_hackers.pdf

(23636 KB) Pobierz
940370895.009.png
940370895.010.png
Machine Learning for Hackers
Drew Conway and John Myles White
Beijing
Cambridge
Farnham
Köln
Sebastopol
Tokyo
940370895.011.png 940370895.012.png 940370895.001.png 940370895.002.png 940370895.003.png 940370895.004.png 940370895.005.png 940370895.006.png
 
Machine Learning for Hackers
by Drew Conway and John Myles White
Copyright © 2012 Drew Conway and John Myles White. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions
are also available for most titles ( http://my.safaribooksonline.com ) . For more information, contact our
corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com .
Editor: Julie Steele
Production Editor: Melanie Yarbrough
Copyeditor: Genevieve d’Entremont
Proofreader: Teresa Horton
Indexer: Angela Howard
Cover Designer: Karen Montgomery
Interior Designer: David Futato
Illustrator: Robert Romano
February 2012:
First Edition.
Revision History for the First Edition:
2012-02-06 First release
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of
O’Reilly Media, Inc. Machine Learning for Hackers , the cover image of a griffon vulture, and related
trade dress are trademarks of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as
trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a
trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and authors assume
no responsibility for errors or omissions, or for damages resulting from the use of the information con-
tained herein.
ISBN: 978-1-449-30371-6
[LSI]
1328629742
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1.
Using R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
R for Machine Learning
2
Downloading and Installing R
5
IDEs and Text Editors
8
Loading and Installing R Packages
9
R Basics for Machine Learning
12
Further Reading on R
27
2.
Data Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Exploration versus Confirmation
29
What Is Data?
30
Inferring the Types of Columns in Your Data
34
Inferring Meaning
36
Numeric Summaries
37
Means, Medians, and Modes
37
Quantiles
40
Standard Deviations and Variances
41
Exploratory Data Visualization
44
Visualizing the Relationships Between Columns
61
3.
Classification: Spam Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
This or That: Binary Classification
73
Moving Gently into Conditional Probability
77
Writing Our First Bayesian Spam Classifier
78
Defining the Classifier and Testing It with Hard Ham
85
Testing the Classifier Against All Email Types
88
Improving the Results
90
iii
940370895.007.png 940370895.008.png
Zgłoś jeśli naruszono regulamin