Introduction to Programming for Researchers Learning Programming Fundamentals Through Dataset Processing in Bash and Python — James R. Derry
Introduction to Programming for Researchers
James R. Derry Introduction to Programming for Researchers Learning Programming Fundamentals Through Dataset Processing in Bash and Python
James R. Derry Austin, TX, USA ISBN-13 (pbk): 979-8-8688-1614-7 ISBN-13 (electronic): 979-8-8688-1615-4 https://doi.org/10.1007/979-8-8688-1615-4 Copyright © 2026 by James R. Derry This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Managing Director, Apress Media LLC: Welmoed Spahr Acquisitions Editor: Celestin Suresh John Development Editor: James Markham Coordinating Editor: Kripa Joseph Cover designed by eStudioCalamar Cover image designed by TyliJura on pixabay.com Distributed to the book trade worldwide by Springer Science+Business Media New York, 1 New York Plaza, New York, NY 10004. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. Apress Media, LLC is a Delaware LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation. For information on translations, please e-mail booktranslations@springernature.com; for reprint, paperback, or audio rights, please e-mail bookpermissions@springernature.com. Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Print and eBook Bulk Sales web page at http://www.apress.com/ bulk-sales. Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub. For more detailed information, please visit https://www.apress.com/gp/services/source-code. If disposing of this product, please recycle the paper
To My Students, Who Shaped the Class and Made It What It Is
Contents About the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv About the Technical Reviewers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Modern Computers and Their History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Today, Most Personal Computers Are Used Primarily As Communication Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 A Brief History of the Computer Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Our Modern Idea of Computers: A Theory of Computation . . . . . . . . . . . . . . . . . . . 11 2 Digital Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1 Fundamentals of Computation I: Transistors, Logic Gates, and Moore’s Law . . . . . 15 2.1.1 Transistors and Moore’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.2 The Transistor Is the Fundamental Physical Unit of Computation . . . . . . 16 2.1.3 Transistors Are Organized into Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.4 Moore’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.5 The Transistor Budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 Fundamentals of Computation II: Bits, Boolean Logic, and the Digital Age of George Boole & Claude Shannon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.1 George Boole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.2 Claude Shannon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Fundamentals of Computation III: Data, Instructions, & Pointers . . . . . . . . . . . . . . . 22 2.4 Code and Data: Cycles of Fetch, Decode, and Execute, Over and Over . . . . . . . . . . 24 3 Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1 Operating Systems and Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1.1 A Brief History of Operating Systems, with an Emphasis on UNIX . . . . 28 3.2 The UNIX/Linux Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3 The Memory Manager and Process Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 Working with Datafiles in Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.5 An Introduction to the Process Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Introduction to Bash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1 The Bash Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Changing the Bash Prompt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 vii
viii Contents 4.3 Navigating Bash History and the LINUX Filesystem . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4 Files in LINUX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4.1 Clobbering a File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.4.2 Changing File Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.4.3 Setting the Session noclobber Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.5 Character Encoding and Text File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.6 The UNIX Philosophy: An Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.7 The Bash Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.7.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.7.2 The Bash Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.7.3 The Bash Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.7.4 The Symbol Table and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.8 Some Bash Tools for Working with Datafiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.9 The General-Purpose Bash Commands time and watch . . . . . . . . . . . . . . . . . . . . . . . 54 4.10 The Bash Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.11 Some Advice for Writing Bash Scripts and Pipelines . . . . . . . . . . . . . . . . . . . . . . . . . 57 5 Bash: Combining Commands and Variables to Make Pipelines and Scripts to Process Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.1 Introduction to vim and the Bash script: How to Make an Executable Script That Runs On the Command Line & Takes Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.1.1 Linting Our Executable Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.1.2 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2 tr Command: Translate or Delete Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2.1 A Bash Pipeline Spellchecker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2.2 A DOS to Linux Files Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.3 Extracting and Analyzing Data from Datafiles Using Bash Pipelines . . . . . . . . . . . . 82 5.4 gawk: How Many Named Stars Are There? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.5 Getting Resultsets from Bash Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.5.1 Advanced Subject: Format Printing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.6 How to Make an Executable Script That Prompts Users for Input . . . . . . . . . . . . . . 95 5.7 Datetime in Bash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.8 Introduction to Regular Expressions Using grep -E . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.9 Words You Can Make on a Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.10 Regular Expressions, sed, & tr: Reformatting Records In a Dataset . . . . . . . . . . . . . 105 5.11 Finding Approximate Matches with agrep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.12 Write Once, Run Everywhere: Embedding Our Executable Scripts in Pipelines & Invoking Them in Other Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6 Algorithms and Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.1 An Introduction to Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.1.1 Recipes as Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.1.2 Definition of an Algorithm from The Art of Computer Programming . . . 118 6.1.3 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.1.4 Euclid GCD Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.1.5 The Little Hummer Card Trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.1.6 The Bubblesort Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.1.7 Notes on the Mystical History of Algorithms . . . . . . . . . . . . . . . . . . . . . . . 129 6.2 An Introduction to Programming Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Contents ix 6.2.1 Richard Hamming on Programming Style, As Told by Brian Kernighan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7 Floating-Point Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.1 Floating-Point Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.2 Working with Floats: Rules of Thumb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.3 Arbitrary Precision Math with mpmath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.4 Improving the Accuracy of Floating-Point Calculations with Herbie . . . . . . . . . . . . 138 8 Introduction to Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8.1 Python Primer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8.1.1 Python Comes With Built-Ins: Built-In Functions, Built-In Data Types and Collections, and Built-In Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 8.1.2 Python Comes with a Built-In Error-Reporting Module Called the Traceback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 8.1.3 The IPython Interactive Shell I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 8.1.4 In Python, Everything Is an Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 9 Using Python As a Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 9.1 Datetime I: Duration of the COVID-19 Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 9.2 Datetime II: Solving Date Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 9.3 Heat Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 9.4 Find the GC Content Percentage of a Nucleotide String . . . . . . . . . . . . . . . . . . . . . . . 155 9.5 Stoichiometry with SymPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 9.6 Ideal Gas Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 9.7 Work Performed by Expanding Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 9.8 Acceleration of Sun on Earth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 9.9 Sound Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 9.10 SymPy on Jupyter Notebooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 9.10.1 The Jupyter Notebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 9.11 Falling Bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 9.12 Spherical Trigonometry in Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 9.13 Climate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 9.14 Digital Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 9.15 Image-Driven Data Analysis: Flood Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 10 Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 10.1 Our First Program: Of Functions, Modules, Garbage Filters, Tests, & Docstrings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 10.1.1 The Least Necessary to Write a Working Function . . . . . . . . . . . . . . . . . . 191 10.1.2 The Least Necessary to Write a Useful Function . . . . . . . . . . . . . . . . . . . . 192 10.1.3 Documenting Our Function for Coders and Users . . . . . . . . . . . . . . . . . . . 193 10.1.4 Saving Our Function to a Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 10.1.5 Autoreloading Edited Module Content to an IPython Session . . . . . . . . . 196 10.1.6 Writing Garbage Filters: Handling Bad Input . . . . . . . . . . . . . . . . . . . . . . . 198 10.1.7 Automating Testing Our Code Using Unit Tests . . . . . . . . . . . . . . . . . . . . . 200 10.1.8 Linting Our Code with Pylint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 10.2 Introduction to Programming I: Implementing Algorithms . . . . . . . . . . . . . . . . . . . . 206 10.2.1 The IPython Interactive Shell II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 10.2.2 Back to Python and Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
x Contents 10.3 Testing If Symbol in String Is Nucleotide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 10.4 Euclid GCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 10.5 Converting Decimal Fractions into Binary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 10.5.1 The More You Know: Brahmagupta (598–668CE) . . . . . . . . . . . . . . . . . . . 225 10.6 Introduction to Programming II: Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 10.6.1 Unit Tests and Our unittest_template.py File . . . . . . . . . . . . . . . . . . . . . . . 225 10.7 Finding the Reverse Complement of a Nucleotide String . . . . . . . . . . . . . . . . . . . . . . 230 10.7.1 A Brief Introduction to the Python Dictionary . . . . . . . . . . . . . . . . . . . . . . 231 10.7.2 Developing Code by Test-Driven Development (TDD) . . . . . . . . . . . . . . . 234 10.8 Counting Symbols in a String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 10.8.1 Error-Trapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 10.9 IPython Magics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 10.10 Introduction to Programming V: Good Programming Practices . . . . . . . . . . . . . . . . 241 10.11 Programs = Algorithms + Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 10.12 Interlude: Ilayda Develops Her Stoichiometry Code . . . . . . . . . . . . . . . . . . . . . . . . . . 244 11 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 11.1 Subroutines: The Genesis of Functions in Programming Languages . . . . . . . . . . . . 247 11.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 11.3 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 11.4 Documenting Your Functions, Making Them Robust with Error-Trapping and Exception Handling, and Proofing Them . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 11.5 Positional vs. Named Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 11.6 Multiple Return Values from a Python Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 11.7 Lambdas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 11.8 Matters of Style When Writing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 11.9 A Calculus Primer: Numeric Integration & Differentiation Using Python . . . . . . . . 256 11.9.1 Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 11.9.2 The More You Know . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 11.9.3 Back to Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 11.9.4 Using Python and Numpy for Numerical Integration . . . . . . . . . . . . . . . . . 262 11.9.5 Numerical Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 11.9.6 Numerical Calculus with SymPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 12 Software Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 12.1 Writing Programs: Top-Down Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 270 12.2 Writing Programs: Converting a Top-Down Design Into A Program of Subroutines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 12.3 Writing Your Code Base As a Set of Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 12.4 Writing Programs: A Practical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 13 Working with Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 13.1 Accessing the Tabular Contents of Datafiles in Python Using a List of Lists (LoL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 13.2 Nested Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 13.3 Finding the Minimum and Maximum Values in an Unsorted Collection . . . . . . . . . 281 13.4 Parsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 13.4.1 Revisiting fasta Parsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Contents xi 13.5 Too Big To Handle: Pre-Processing Large Datasets, Extracting Only Needed Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 13.6 One Record per Text File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 14 Programming Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 14.1 The Analysis of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 14.2 O(n): Finding a Value in an Unsorted Collection of Index/Value Pairs . . . . . . . . . . . 295 14.3 O(nm): Nested Loops and Their Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 14.3.1 Illustrating Executing Nested Loops with Nested Dolls . . . . . . . . . . . . . . 297 14.3.2 The Output from Running Our Example Nested Loop . . . . . . . . . . . . . . . 303 14.3.3 Can We Do Better? Algebra to the Rescue! . . . . . . . . . . . . . . . . . . . . . . . . 303 14.4 O(ln2 n): Binary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 14.5 Functional Equivalence and Profiling Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 14.5.1 Dictionary As Lookup Table vs. Conditional Testing . . . . . . . . . . . . . . . . 308 14.6 Finding Min and Max Values in an Unsorted Collection II . . . . . . . . . . . . . . . . . . . . 309 14.7 Multiple Passes Through Dataset vs. Single Pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 14.7.1 An Outline of the Problem and a Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 310 14.7.2 Extending the Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 15 Other Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 15.1 An Introduction to Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 15.1.1 Saving Our Python Collections by Pickling Them . . . . . . . . . . . . . . . . . . . 322 15.2 Writing Python Scripts That Write Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 15.3 Interactive Scripts That Prompt Users for Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 15.4 The Python Half-Open Interval, Range Objects, and Slicing . . . . . . . . . . . . . . . . . . . 328 15.4.1 The Python Half-Open Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 15.4.2 The Range Object Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 15.5 Finding Intervals with Overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 15.6 Finding Interval Overlap in Genomic Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 15.7 Slicing Lists and Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 15.8 From Nucleotide String to Amino Acid Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 15.9 Comprehension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 15.9.1 Slicing LoL, Extracting Columns with Comprehension . . . . . . . . . . . . . . 343 15.10 The Sieve of Eratosthenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 15.11 Transposing a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 15.11.1 Transposing Tabular Datasets in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 15.12 Stacks and Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 15.12.1 Stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 15.12.2 Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 15.12.3 Algorithm: The Josephus Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 15.13 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 15.13.1 A Few Recursive Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 16 SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 16.1 matplotlib: Graphics with SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 16.2 NetworkX: Working with Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 16.3 NumPy: Foundational Library of SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 16.3.1 The ndarray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 16.3.2 Linear Algebra Has Three Objects: Scalar, Vector, and Matrix . . . . . . . . . 369
xii Contents 16.3.3 Single-Instruction Multiple Data Registers in CPUs . . . . . . . . . . . . . . . . . 370 16.3.4 Universal Functions (ufuncs) and Vectorized Operations . . . . . . . . . . . . . 370 16.3.5 Row and Column Vectors in Memory and Data Processing . . . . . . . . . . . 371 16.4 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 16.4.1 Datasets As Matrices I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 16.4.2 Making a Rotatable 3D Graph from Data in a Dataset . . . . . . . . . . . . . . . . 376 16.4.3 Datasets As Matrices II: Partitioning a Matrix . . . . . . . . . . . . . . . . . . . . . . 379 16.5 Pandas: Working with Labeled Datasets in Pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . 382 16.6 Pandas: Using Masks to Query Recordsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 16.7 Pandas: Getting Statistics on Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 16.7.1 Using Seaborn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 16.8 Pandas: Processing Datasets Programmatically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 16.9 SymPy: Symbolic Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 17 Odds and Ends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 17.1 Writing Programs: Writing, Rewriting, and Matters of Style . . . . . . . . . . . . . . . . . . . 395 17.2 Python Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396 17.3 Datetime in Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 17.3.1 Filling In Missing Datetime Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 17.4 Introduction to Parallel Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 18 Writing a Large Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 18.1 Putting It All Together: Solving Triangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 18.2 Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 18.3 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 18.4 Organization of the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 18.5 Test-Driven Development (TDD) and Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 18.6 Linting Our Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 18.7 Pencil to Paper: Our Top-Down Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 18.8 Our First Unit Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 18.9 Our First Draft of solve_triangle.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 18.10 Running Our First Unit Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 18.11 Running solve_triangle Function the First Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416 18.12 A Note on Structured Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 18.13 Adding Design Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 18.14 Writing Our First Draft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420 18.15 Testing Our Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 18.16 The Garbage Filter: Writing Unit Tests and Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 424 18.17 Finishing solve_triangle(), v1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 18.18 Improving solve_triangle(), v1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 18.19 TI-59 Calculator: Triangle Solution, Master Library ROM Module—A Different Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 18.20 Losing the rnd Bool from the Argument List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428 18.21 Rethinking the Main Section of solve_triangle() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 18.22 solve_triangle(), v2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 18.23 What’s Left to Do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Contents xiii Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Chapter 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Chapter 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Chapter 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Chapter 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Chapter 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Chapter 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Chapter 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
About the Author James R. Derry is a seasoned professional with 32 years of experience at UT-Austin, where he has served as a Senior Systems Administrator. For the past 14 years, he has also been an educator, teaching the course “Introduction to Programming for Researchers.” His extensive background in both systems administration and programming education uniquely positions him to impart valuable computational skills to researchers in natural sciences and engineering, enhancing their productivity and efficiency in handling complex datasets. xv
About the Technical Reviewers Karanbir Singh is an accomplished Engineering Leader with over 7 years of experience leading AI/ML engineering, distributed systems, and microservices projects across diverse industries, including fintech and automotive. Currently working as a Senior Software Engineer at Salesforce, he focuses on backend technologies as well as AI. He specializes in time series analysis, RAGs, and AI agents. His career has been marked by a commitment to building high-performing teams, driving technological innovation, and delivering impactful solutions that enhance business outcomes. At TrueML, as an Engineering Manager, he managed a critical team to develop and deploy machine learning models in production. He successfully expanded and led engineering teams, significantly im- proving feature development velocity and client engagement through strategic collaboration and mentorship. His leadership directly con- tributed to increased revenue, client retention, and substantial cost savings through innovative internal solutions. His role involved not only steering technical projects but also shaping the company’s roadmap in partnership with data science, product management, and platform teams. Previously, at Lucid Motors and Poynt, he developed critical com- ponents and integrations that advanced product capabilities and strengthened industry partnerships. His technical expertise spans across AI/ML, cloud computing, and software architecture, and he is adept at utilizing cutting-edge technologies and methodologies to drive results. Karanbir holds a master’s degree in Computer Software Engineering from San Jose State University and has been recognized for his inno- vative contributions, including winning the Silicon Valley Innovation Challenge. He is passionate about mentoring and coaching emerging talent and thrives in environments where he can leverage his skills to solve complex problems and advance technological initiatives. xvii
xviii About the Technical Reviewers Krishnendu Dasgupta is a computer science engineer with nearly 14 years of experience. He is currently working on vision language models and distributed and decentralized inference, with a strong focus on healthcare and advanced computing. He is a sole proprietor and the former Head of Machine Learning at Mondosano GmbH, where he led data science initiatives in clinical trial recommendations and patient health profiling using disease and drug data. With over a decade of experience in applied machine learning, he has held key roles at DOCONVID AI [Co-Founder], NTT DATA, PwC, and Thoucentric. His research interests span graph machine learning, medical imaging, and privacy-preserving AI solutions. He has also contributed to AI risk assessment research. He is an alumnus of the 5th Cohort of the MIT Entrepreneurship and Innovation Bootcamp (2018). He also volunteered as a Section Leader for Stanford Univer- sity’s Code in Place Initiative, teaching programming online for the 2024 cohort. In addition, he dedicates his time to voluntary research collaborations with research NGOs and universities.
Acknowledgements I wish to thank Dr. Jennifer Morgan, who was the first to underwrite this class so that I could teach it; Dr. Hans Hofmann, who next underwrote the class; and to Mark McFarland, who allowed me to continue teaching it once I was reassigned from the School of Biological Sciences to the Office of Information Technology, College of Natural Sciences. I wish to thank my colleagues for their unflagging support and suggestions to improve the Linux and Bash content, especially Eric Rostetter, Dr. Maorong Zou, and Hampton Finger. And I wish to thank the students who provided data and suggestions on how to make the class better, which I have discovered is itself a never-ending project. Your contributions made it more interesting and engaging than it would have been otherwise. xix
Introduction In 2011, I proposed to teach a class to biology grad students meant to prepare them for courses in the university catalog that have a substantial programming component. Over the next 14 years, the class has grown in audience and scope and evolved in approach. This book is a product of that class. If you are a student working toward a career in STEM, or a scientist or engineer who missed out on programming classes, and you’re looking for a book that uses a hands-on approach to teach you how to program in Python, this may be the book for you. We start with an introduction to Linux and Bash, and in Bash, we move quickly to writing small scripts and pipelines to process datasets. Some of the Bash tools that we learn, like sed and vim, we use throughout the book. After gaining a familiarity with Bash, we move on to learning algorithms and programming in Python. And programming in Python gives us access to a wealth of tools written and used by STEM researchers known collectively as SciPy. The bulk of this book, like the bulk of the class, deals with writing programs that use these tools to work with and process datasets. And as we learn how to use these tools, we will also learn about and incorporate best programming practices into our work, including communicating effectively within our code to users and other programmers, testing user input to ensure it is expected input, handling bad input and exceptions, organizing code into modules and functions, and linting and unit testing our code. Altogether, I hope to teach you how to write code that is not only correct but also robust, as easy to use as you can make it, and easy to maintain. And by the time you finish, a whole new set of tools will be available to you to analyze datasets, automate tasks, and improve efficiency. xxi
1Introduction With regard to computational might, most modern personal computers are greatly underutilized. In this brief introduction, we look at the history of the Computer Age, with a focus on their growth in power and capabilities. We finish by considering the universal Turing machine and comparing modern personal computers to the ideal UTM. 1.1 Modern Computers and Their History The purpose of computing is insight, not numbers. —Richard Hamming 1.1.1 Today, Most Personal Computers Are Used Primarily As Communication Devices Most personal computers today are not used to compute. They are used to access sites on the Internet, to send and receive email, to read articles posted on news websites, to chat in real time audio-visually with family and friends all over the world, and to exchange posts on social media platforms. In other words, most personal computers today are used as communication devices. And yet, most personal computers today, benefitting from market demand and tremendous progress in the manufacture of computer components, have capacities in mass storage, memory, and computation cycles to serve as much more than mere communication devices. When used as computation devices to process datasets, most personal computers today can do in microseconds what researchers need months to do when working by eye and hand alone—or even when using a spreadsheet application (but not automating that work with the app’s ability to run scripts). It is the speed and ability of computers to execute repetitive tasks without error that make them ideal tools to process datasets, and this is how we will learn how to use them. In other words, we will be using our personal computers as the first computers of the Computer Age were used—as computation devices. © James R. Derry 2026 J. R. Derry, Introduction to Programming for Researchers, https://doi.org/10.1007/979-8-8688-1615-4_1 1
2 1 Introduction 1.1.2 A Brief History of the Computer Age Before the Computer Age, the word “computer” applied to humans, either for their mathematical abilities or their job.∗ The word was later applied to machines as a metaphor or analogy to describe to lay audiences the kind of work these machines performed. For years after the rise of these machines, the word “computer” still applied to clerical staff who did mathematical computations, often with slide rules and mechanical calculators. Figure 1-1 The Harvard Observatory computers with the director of the observatory, Charles Pickering. May, 1913. File: Edward Charles Pickering & His Computers, 13 May 1913. Public domain Famous among the computer staffs of universities are the Harvard Observatory computers; and for their achievements, they deserve special mention. The story goes that the observatory’s all-male staff of computers made so many errors in cataloging and computation that Pickering fired them and put his housemaid, Williamina Fleming, in charge of hiring and managing a new staff of computers. She hired all women. ∗ Alan Turing in his doctoral thesis (1936), in which he invents the theoretical machine that would later be called a “Turing Machine,” refers to such a person as a “computer.” Ex gratis, Turing’s first references read: “The behaviour of the computer at any moment is determined by the symbols which he is observing, and his ‘state of mind’ at that moment. We may suppose that there is a bound B to the number of symbols or squares which the computer can observe at one moment. If he wishes to observe more, he must use successive observations. We will also suppose that the number of states of mind which need be taken into account is finite.” A.M. Turing. “On Computable Numbers, with an Application to the Entscheidungsproblem”. Proceedings of London Math Soc., 1937. pg 250, 251, 252, 253, and 254.
1.1 Modern Computers and Their History 3 Not only did the women prove themselves adept at the clerical tasks of computation and cataloging; but from their experience, they also developed such expertise in the subject that they went on to make significant contributions in astronomy. Indeed, they are now considered astronomers. Most notable among them are Annie Jump Cannon and Henrietta Swan Leavitt. Harvard Observatory was an early pioneer in the use of photography in astronomy, a marriage of technologies that allowed astronomers to capture permanent images of the observable universe for cataloging and more objective observations than could be done by a single astronomer staring into a telescope’s eyepiece while scribbling notes and sketches. As keepers and catalogers of Harvard Observatory’s photographic plates, Annie Jump Cannon de- veloped our current stellar classification scheme, while Henrietta Swan Leavitt discovered the period- luminosity relationship for Cepheid variables that astronomers still use to determine interstellar, even intergalactic, distances. In his essay “Mathematical Creation,” Henri Poincaré wrote: “Every good mathematician ought to be a good chess player, and inversely; likewise he should be a good computer. Of course that sometimes happens; thus Gauss was at the same time a geometer of genius and a very precocious and accurate computer.”a aGhiselin, Brewster. The Creative Process: A Symposium. CALIFORNIA PAPERBACK EDITION, 1985. pg 23. World War II by convention marks the start of the Computer Age, when two computation machines were built with no moving parts to do the computation—all computation was done electrically. Both were built to support the war efforts of Allied nations. The first, Colossus, was built by the British at Bletchley Park to decipher messages encoded by Lorenz cipher machines. The second, ENIAC, was built by the Americans to calculate artillery firing tables. Before the Computer Age, computation machines used moving parts to do their computation. Alan Turing’s famous Enigma-decoding bombes are a well-known example. To decrypt Enigma messages, Alan Turing invented an electromechanical device called a “bombe,” which contained a spinning cylinder, simulating the action of an Enigma rotor. In other words, moving parts were central to the bombe’s ability to decipher Enigma-encoded messages. Built by Tommy Flowers (team leader for the Mark I) and Allen Coombs (team leader for the Mark II), Colossus is the first computer whose computations required no moving parts and is considered the first computer of the Computer Age. It was built in Bletchley Park, which also housed Turing’s bombes, but it was so classified that not even Alan Turing knew of its existence. It is believed that as many as eight Colossi were built though most were decommissioned and destroyed after the war to preserve the secret of their existence. There was a compelling reason to do this. Although Colossus is the first all-electrical computer built and put to use, popular myth holds that ENIAC was the first machine to usher in the Computer Age. The reason for this confusion is that at the end of the war, the Americans were able to declassify ENIAC and announce its existence to the world; whereas during the Cold War, the British used Colossus and its successors to decipher Soviet communications sent between Lorenz cipher machines throughout most of the Cold War. These were Lorenz cipher machines that the Soviet Army had captured as it overran Wehrmacht strategic commands. The Soviet government, not having been told that the British had cracked the Lorenz ciphers and believing them to be uncrackable, put these machines to use in Soviet embassies throughout Eastern Europe and used them in daily radio communiqués to Moscow.
Loading comments...
Reply to Comment
Edit Comment