深度学习入门:基于Python的理论与实现 (斋藤康毅) (Z-Library)
Author: 斋藤康毅
商业
本书是深度学习真正意义上的入门书,深入浅出地剖析了深度学习的原理和相关技术。书中使用Python3,尽量不依赖外部库或工具,从基本的数学知识出发,带领读者从零创建一个经典的深度学习网络,使读者在此过程中逐步理解深度学习。书中不仅介绍了深度学习和神经网络的概念、特征等基础知识,对误差反向传播法、卷积神经网络等也有深入讲解,此外还介绍了深度学习相关的实用技巧,自动驾驶、图像生成、强化学习等方面的应用,以及为什么加深层可以提高识别精度等“为什么”的问题。
📄 File Format:
PDF
💾 File Size:
10.7 MB
24
Views
0
Downloads
0.00
Total Donations
📄 Text Preview (First 20 pages)
ℹ️
Registered users can read the full content for free
Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.
📄 Page
1
(This page has no text content)
📄 Page
2
(This page has no text content)
📄 Page
3
图灵社区的电子书没有采用专有客户 端,您可以在任意设备上,用自己喜 欢的浏览器和PDF阅读器进行阅读。 但您购买的电子书仅供您个人使用, 未经授权,不得进行传播。 我们愿意相信读者具有这样的良知和 觉悟,与我们共同保护知识产权。 如果购买者有侵权行为,我们可能对 该用户实施包括但不限于关闭该帐号 等维权措施,并可能追究法律责任。
📄 Page
4
图 灵 程 序 设 计 丛 书 人 民 邮 电 出 版 社 北 京 基于Python的理论与实现 [日]斋藤康毅 著 陆宇杰 译 深度学习入门 Beijing・Boston・Farnham・Sebastopol・Tokyo O’Reilly Japan, Inc. 授权人民邮电出版社出版 Deep Learning from Scratch
📄 Page
5
内 容 提 要 本书是深度学习真正意义上的入门书,深入浅出地剖析了深度学习的原理和相关技术。 书中使用Python 3,尽量不依赖外部库或工具,带领读者从零创建一个经典的深度学习网 络,使读者在此过程中逐步理解深度学习。书中不仅介绍了深度学习和神经网络的概念、 特征等基础知识,对误差反向传播法、卷积神经网络等也有深入讲解,此外还介绍了学 习相关的实用技巧,自动驾驶、图像生成、强化学习等方面的应用,以及为什么加深层 可以提高识别精度等“为什么”的问题。 本书适合深度学习初学者阅读,也可作为高校教材使用。 ◆ 著 [日] 斋藤康毅 译 陆宇杰 责任编辑 杜晓静 执行编辑 刘香娣 责任印制 周昇亮 ◆ 人民邮电出版社出版发行 北京市丰台区成寿寺路11号 邮编 100164 电子邮件 315@ptpress.com.cn 网址 http://www.ptpress.com.cn 北京 印刷 ◆ 开本:880×1230 1/32 印张:9.625 字数:300千字 2018年7月第 1 版 印数:1 - 4 000册 2018年7月北京第1次印刷 著作权合同登记号 图字:01-2017-0526号 定价:59.00元 读者服务热线:(010)51095186转600 印装质量热线:(010)81055316 反盗版热线:(010)81055315 广告经营许可证:京东工商广登字20170147号 深度学习入门 : 基于Python的理论与实现 / (日) 斋藤康毅著 ; 陆宇杰译. -- 北京 : 人民邮电出版社, 2018.7 (图灵程序设计丛书) ISBN 978-7-115-48558-8 Ⅰ. ①深… Ⅱ. ①斋… ②陆… Ⅲ. ①软件工具-程 序设计 Ⅳ. ①TP311.561 中国版本图书馆CIP数据核字(2018)第112509号 图书在版编目(CIP)数据
📄 Page
6
版 权 声 明 Copyright © 2016 Koki Saitoh, O’Reilly Japan, Inc. Posts and Telecommunications Press, 2018. Authorized translation of the Japanese edition of “Deep Learning from Scratch” © 2016 O’Reilly Japan, Inc. This translation is published and sold by permission of O’Reilly Japan, Inc., the owner of all rights to publish and sell the same. 日文原版由O’Reilly Japan, Inc.出版,2016。 简体中文版由人民邮电出版社出版,2018。日文原版的翻译得到O’Reilly Japan, Inc.的授权。此简体中文版的出版和销售得到出版权和销售权的所有 者——O’Reilly Japan, Inc.的许可。 版权所有,未得书面许可,本书的任何部分和全部不得以任何形式重制。
📄 Page
7
O’Reilly Media 通过图书、杂志、在线服务、调查研究和会议等方式传播创新知识。 自 1978 年开始,O’Reilly 一直都是前沿发展的见证者和推动者。超级极客们正在开 创着未来,而我们关注真正重要的技术趋势——通过放大那些“细微的信号”来刺激 社会对新科技的应用。作为技术社区中活跃的参与者,O’Reilly 的发展充满了对创新 的倡导、创造和发扬光大。 O’Reilly 为软件开发人员带来革命性的“动物书”;创建第一个商业网站(GNN);组 织了影响深远的开放源代码峰会,以至于开源软件运动以此命名;创立了 Make 杂志, 从而成为 DIY 革命的主要先锋;公司一如既往地通过多种形式缔结信息与人的纽带。 O’Reilly 的会议和峰会集聚了众多超级极客和高瞻远瞩的商业领袖,共同描绘出开创 新产业的革命性思想。作为技术人士获取信息的选择,O’Reilly 现在还将先锋专家的 知识传递给普通的计算机用户。无论是通过书籍出版、在线服务或者面授课程,每一 项 O’Reilly 的产品都反映了公司不可动摇的理念——信息是激发创新的力量。 业界评论 “O’Reilly Radar 博客有口皆碑。” ——Wired “O’Reilly 凭借一系列(真希望当初我也想到了)非凡想法建立了数百万美元的业务。” ——Business 2.0 “O’Reilly Conference 是聚集关键思想领袖的绝对典范。” ——CRN “一本 O’Reilly 的书就代表一个有用、有前途、需要学习的主题。” ——Irish Times “Tim 是位特立独行的商人,他不光放眼于最长远、最广阔的视野,并且切实地按照 Yogi Berra 的建议去做了:‘如果你在路上遇到岔路口,走小路(岔路)。’回顾过去, Tim 似乎每一次都选择了小路,而且有几次都是一闪即逝的机会,尽管大路也不错。” ——Linux Journal O’Reilly Media, Inc.介绍
📄 Page
8
目录 译者序· ·····················································xiii 前言· ······················································· xv 第1章 Python入门· ··········································· 1 1.1 Python是什么· ········································· 1 1.2 Python的安装· ········································· 2 1.2.1 Python版本· ····································· 2 1.2.2 使用的外部库····································· 2 1.2.3 Anaconda发行版· ································· 3 1.3 Python解释器· ········································· 4 1.3.1 算术计算· ······································· 4 1.3.2 数据类型· ······································· 5 1.3.3 变量· ··········································· 5 1.3.4 列表· ··········································· 6 1.3.5 字典· ··········································· 7 1.3.6 布尔型· ········································· 7 1.3.7 if语句· ·········································· 8 1.3.8 for 语句· ········································ 8 1.3.9 函数· ··········································· 9 1.4 Python脚本文件· ······································· 9
📄 Page
9
目录vi 1.4.1 保存为文件· ····································· 9 1.4.2 类· ············································ 10 1.5 NumPy· ·············································· 11 1.5.1 导入NumPy· ···································· 11 1.5.2 生成NumPy数组································· 12 1.5.3 NumPy 的算术运算······························· 12 1.5.4 NumPy的N维数组· ······························ 13 1.5.5 广播· ·········································· 14 1.5.6 访问元素· ······································ 15 1.6 Matplotlib· ··········································· 16 1.6.1 绘制简单图形···································· 16 1.6.2 pyplot的功能· ··································· 17 1.6.3 显示图像· ······································ 18 1.7 小结· ················································ 19 第2章 感知机· ·············································· 21 2.1 感知机是什么·········································· 21 2.2 简单逻辑电路·········································· 23 2.2.1 与门· ·········································· 23 2.2.2 与非门和或门···································· 23 2.3 感知机的实现·········································· 25 2.3.1 简单的实现· ···································· 25 2.3.2 导入权重和偏置· ································· 26 2.3.3 使用权重和偏置的实现· ··························· 26 2.4 感知机的局限性· ······································· 28 2.4.1 异或门· ········································ 28 2.4.2 线性和非线性···································· 30 2.5 多层感知机············································ 31 2.5.1 已有门电路的组合· ······························· 31
📄 Page
10
目录 vii 2.5.2 异或门的实现···································· 33 2.6 从与非门到计算机· ····································· 35 2.7 小结· ················································ 36 第3章 神经网络· ············································ 37 3.1 从感知机到神经网络· ··································· 37 3.1.1 神经网络的例子· ································· 37 3.1.2 复习感知机· ···································· 38 3.1.3 激活函数登场···································· 40 3.2 激活函数· ············································ 42 3.2.1 sigmoid函数· ···································· 42 3.2.2 阶跃函数的实现· ································· 43 3.2.3 阶跃函数的图形· ································· 44 3.2.4 sigmoid函数的实现· ······························ 45 3.2.5 sigmoid函数和阶跃函数的比较· ···················· 46 3.2.6 非线性函数· ···································· 48 3.2.7 ReLU函数· ····································· 49 3.3 多维数组的运算· ······································· 50 3.3.1 多维数组· ······································ 50 3.3.2 矩阵乘法· ······································ 51 3.3.3 神经网络的内积· ································· 55 3.4 3层神经网络的实现····································· 56 3.4.1 符号确认· ······································ 57 3.4.2 各层间信号传递的实现· ··························· 58 3.4.3 代码实现小结···································· 62 3.5 输出层的设计·········································· 63 3.5.1 恒等函数和 softmax函数· ·························· 64 3.5.2 实现 softmax函数时的注意事项· ···················· 66 3.5.3 softmax函数的特征· ······························ 67
📄 Page
11
目录viii 3.5.4 输出层的神经元数量· ····························· 68 3.6 手写数字识别·········································· 69 3.6.1 MNIST数据集· ·································· 70 3.6.2 神经网络的推理处理· ····························· 73 3.6.3 批处理· ········································ 75 3.7 小结· ················································ 79 第4章 神经网络的学习· ······································· 81 4.1 从数据中学习·········································· 81 4.1.1 数据驱动· ······································ 82 4.1.2 训练数据和测试数据· ····························· 84 4.2 损失函数· ············································ 85 4.2.1 均方误差· ······································ 85 4.2.2 交叉熵误差· ···································· 87 4.2.3 mini-batch学习· ································· 88 4.2.4 mini-batch版交叉熵误差的实现· ···················· 91 4.2.5 为何要设定损失函数· ····························· 92 4.3 数值微分· ············································ 94 4.3.1 导数· ·········································· 94 4.3.2 数值微分的例子· ································· 96 4.3.3 偏导数· ········································ 98 4.4 梯度· ················································100 4.4.1 梯度法· ········································102 4.4.2 神经网络的梯度· ·································106 4.5 学习算法的实现· ·······································109 4.5.1 2层神经网络的类·································110 4.5.2 mini-batch的实现· ·······························114 4.5.3 基于测试数据的评价· ·····························116 4.6 小结· ················································118
📄 Page
12
目录 ix 第5章 误差反向传播法· ·······································121 5.1 计算图· ··············································121 5.1.1 用计算图求解····································122 5.1.2 局部计算· ······································124 5.1.3 为何用计算图解题· ·······························125 5.2 链式法则· ············································126 5.2.1 计算图的反向传播· ·······························127 5.2.2 什么是链式法则· ·································127 5.2.3 链式法则和计算图· ·······························129 5.3 反向传播· ············································130 5.3.1 加法节点的反向传播· ·····························130 5.3.2 乘法节点的反向传播· ·····························132 5.3.3 苹果的例子· ····································133 5.4 简单层的实现··········································135 5.4.1 乘法层的实现····································135 5.4.2 加法层的实现····································137 5.5 激活函数层的实现· ·····································139 5.5.1 ReLU层· ·······································139 5.5.2 Sigmoid层· ·····································141 5.6 Affine/Softmax层的实现· ·······························144 5.6.1 Affine层· ·······································144 5.6.2 批版本的Affine层· ·······························148 5.6.3 Softmax-with-Loss 层· ····························150 5.7 误差反向传播法的实现· ·································154 5.7.1 神经网络学习的全貌图· ···························154 5.7.2 对应误差反向传播法的神经网络的实现· ··············155 5.7.3 误差反向传播法的梯度确认· ·······················158 5.7.4 使用误差反向传播法的学习· ·······················159 5.8 小结· ················································161
📄 Page
13
目录x 第6章 与学习相关的技巧· ·····································163 6.1 参数的更新············································163 6.1.1 探险家的故事····································164 6.1.2 SGD· ··········································164 6.1.3 SGD的缺点· ····································166 6.1.4 Momentum· ····································168 6.1.5 AdaGrad· ······································170 6.1.6 Adam· ·········································172 6.1.7 使用哪种更新方法呢· ·····························174 6.1.8 基于MNIST数据集的更新方法的比较················175 6.2 权重的初始值··········································176 6.2.1 可以将权重初始值设为0吗· ························176 6.2.2 隐藏层的激活值的分布· ···························177 6.2.3 ReLU的权重初始值·······························181 6.2.4 基于MNIST数据集的权重初始值的比较· ·············183 6.3 Batch Normalization· ···································184 6.3.1 Batch Normalization的算法· ·······················184 6.3.2 Batch Normalization的评估· ·······················186 6.4 正则化· ··············································188 6.4.1 过拟合· ········································189 6.4.2 权值衰减· ······································191 6.4.3 Dropout· ·······································192 6.5 超参数的验证··········································195 6.5.1 验证数据· ······································195 6.5.2 超参数的最优化· ·································196 6.5.3 超参数最优化的实现· ·····························198 6.6 小结· ················································200
📄 Page
14
目录 xi 第7章 卷积神经网络· ·········································201 7.1 整体结构· ············································201 7.2 卷积层· ··············································202 7.2.1 全连接层存在的问题· ·····························203 7.2.2 卷积运算· ······································203 7.2.3 填充· ··········································206 7.2.4 步幅· ··········································207 7.2.5 3维数据的卷积运算· ······························209 7.2.6 结合方块思考····································211 7.2.7 批处理· ········································213 7.3 池化层· ··············································214 7.4 卷积层和池化层的实现· ·································216 7.4.1 4维数组· ·······································216 7.4.2 基于 im2col的展开· ·······························217 7.4.3 卷积层的实现····································219 7.4.4 池化层的实现····································222 7.5 CNN的实现· ··········································224 7.6 CNN的可视化· ········································228 7.6.1 第1层权重的可视化· ·····························228 7.6.2 基于分层结构的信息提取· ·························230 7.7 具有代表性的CNN·····································231 7.7.1 LeNet· ·········································231 7.7.2 AlexNet· ·······································232 7.8 小结· ················································233 第8章 深度学习· ············································235 8.1 加深网络· ············································235 8.1.1 向更深的网络出发· ·······························235 8.1.2 进一步提高识别精度· ·····························238
📄 Page
15
目录xii 8.1.3 加深层的动机····································240 8.2 深度学习的小历史· ·····································242 8.2.1 ImageNet· ······································243 8.2.2 VGG···········································244 8.2.3 GoogLeNet· ·····································245 8.2.4 ResNet· ········································246 8.3 深度学习的高速化· ·····································248 8.3.1 需要努力解决的问题· ·····························248 8.3.2 基于GPU的高速化· ······························249 8.3.3 分布式学习· ····································250 8.3.4 运算精度的位数缩减· ·····························252 8.4 深度学习的应用案例· ···································253 8.4.1 物体检测· ······································253 8.4.2 图像分割· ······································255 8.4.3 图像标题的生成· ·································256 8.5 深度学习的未来· ·······································258 8.5.1 图像风格变换····································258 8.5.2 图像的生成· ····································259 8.5.3 自动驾驶· ······································261 8.5.4 Deep Q-Network(强化学习)························262 8.6 小结· ················································264 附录A Softmax-with-Loss层的计算图· ···························267 A.1 正向传播· ············································268 A.2 反向传播· ············································270 A.3 小结· ················································277 参考文献·····················································279
📄 Page
16
译者序 深度学习的浪潮已经汹涌澎湃了一段时间了,市面上相关的图书也已经 出版了很多。其中,既有知名学者伊恩·古德费洛(Ian Goodfellow)等人撰 写的系统介绍深度学习基本理论的《深度学习》,也有各种介绍深度学习框 架的使用方法的入门书。你可能会问,现在再出一本关于深度学习的书,是 不是“为时已晚”?其实并非如此,因为本书考察深度学习的角度非常独特, 它的出版可以说是“千呼万唤始出来”。 本书最大的特点是“剖解”了深度学习的底层技术。正如美国物理学家 理查德·费曼(Richard Phillips Feynman)所说: “What I cannot create, I do not understand.”只有创造一个东西,才算真正弄懂了一个问题。本书就 是教你如何创建深度学习模型的一本书。并且,本书不使用任何现有的深度 学习框架,尽可能仅使用最基本的数学知识和Python库,从零讲解深度学 习核心问题的数学原理,从零创建一个经典的深度学习网络。 本书的日文版曾一度占据了东京大学校内书店(本乡校区)理工类图书 的畅销书榜首。各类读者阅读本书,均可有所受益。对于非AI方向的技术 人员,本书将大大降低入门深度学习的门槛;对于在校的大学生、研究生, 本书不失为学习深度学习的一本好教材;即便是对于在工作中已经熟练使用 框架开发各类深度学习模型的读者,也可以从本书中获得新的体会。 本书从开始翻译到出版,前前后后历时一年之久。译者翻译时力求忠于 原文,表达简练。为了保证翻译质量,每翻译完一章后,译者都会放置一段
📄 Page
17
译者序xiv 时间,再重新检查一遍。图灵公司的专业编辑们又进一步对译稿进行了全面 细致的校对,提出了许多宝贵意见,在此表示感谢。但是,由于译者才疏学浅, 书中难免存在一些错误或疏漏,恳请读者批评指正,以便我们在重印时改正。 最后,希望本书的出版能为国内的AI技术社区添砖加瓦! 陆宇杰 2018年2月 上海
📄 Page
18
前言 科幻电影般的世界已经变成了现实—人工智能战胜过日本将棋、国际 象棋的冠军,最近甚至又打败了围棋冠军;智能手机不仅可以理解人们说的话, 还能在视频通话中进行实时的“机器翻译”;配备了摄像头的“自动防撞的车” 保护着人们的生命安全,自动驾驶技术的实用化也为期不远。环顾我们的四 周,原来被认为只有人类才能做到的事情,现在人工智能都能毫无差错地完 成,甚至试图超越人类。因为人工智能的发展,我们所处的世界正在逐渐变 成一个崭新的世界。 在这个发展速度惊人的世界背后,深度学习技术在发挥着重要作用。对 于深度学习,世界各地的研究人员不吝褒奖之辞,称赞其为革新性技术,甚 至有人认为它是几十年才有一次的突破。实际上,深度学习这个词经常出现 在报纸和杂志中,备受关注,就连一般大众也都有所耳闻。 本书就是一本以深度学习为主题的书,目的是让读者尽可能深入地理解 深度学习的技术。因此,本书提出了“从零开始”这个概念。 本书的特点是通过实现深度学习的过程,来逼近深度学习的本质。通过 实现深度学习的程序,尽可能无遗漏地介绍深度学习相关的技术。另外,本 书还提供了实际可运行的程序,供读者自己进行各种各样的实验。 为了实现深度学习,我们需要经历很多考验,花费很长时间,但是相应 地也能学到和发现很多东西。而且,实现深度学习的过程是一个有趣的、令
📄 Page
19
前言xvi 人兴奋的过程。希望读者通过这一过程可以熟悉深度学习中使用的技术,并 能从中感受到快乐。 目前,深度学习活跃在世界上各个地方。在几乎人手一部的智能手机中、 开启自动驾驶的汽车中、为Web服务提供动力的服务器中,深度学习都在 发挥着作用。此时此刻,就在很多人没有注意到的地方,深度学习正在默默 地发挥着其功能。今后,深度学习势必将更加活跃。为了让读者理解深度学 习的相关技术,感受到深度学习的魅力,笔者写下了本书。 本书的理念 本书是一本讲解深度学习的书,将从最基础的内容开始讲起,逐一介绍 理解深度学习所需的知识。书中尽可能用平实的语言来介绍深度学习的概念、 特征、工作原理等内容。不过,本书并不是只介绍技术的概要,而是旨在让 读者更深入地理解深度学习。这是本书的特色之一。 那么,怎么才能更深入地理解深度学习呢?在笔者看来,最好的办法就 是亲自实现。从零开始编写可实际运行的程序,一边看源代码,一边思考。 笔者坚信,这种做法对正确理解深度学习(以及那些看上去很高级的技术) 是很重要的。这里用了“从零开始”一词,表示我们将尽可能地不依赖外部 的现成品(库、工具等)。也就是说,本书的目标是,尽量不使用内容不明的 黑盒,而是从自己能理解的最基础的知识出发,一步一步地实现最先进的深 度学习技术。并通过这一实现过程,使读者加深对深度学习的理解。 如果把本书比作一本关于汽车的书,那么本书并不会教你怎么开车,其 着眼点不是汽车的驾驶方法,而是要让读者理解汽车的原理。为了让读者理 解汽车的结构,必须打开汽车的引擎盖,把零件一个一个地拿在手里观察, 并尝试操作它们。之后,用尽可能简单的形式提取汽车的本质,并组装汽车 模型。本书的目标是,通过制造汽车模型的过程,让读者感受到自己可以实 际制造出汽车,并在这一过程中熟悉汽车相关的技术。 为了实现深度学习,本书使用了Python这一编程语言。Python非常受 欢迎,初学者也能轻松使用。Python尤其适合用来制作样品(原型),使用
📄 Page
20
前言 xvii Python可以立刻尝试突然想到的东西,一边观察结果,一边进行各种各样 的实验。本书将在讲解深度学习理论的同时,使用Python实现程序,进行 各种实验。 在光看数学式和理论说明无法理解的情况下,可以尝试阅读源代码 并运行,很多时候思路都会变得清晰起来。对数学式感到困惑时, 就阅读源代码来理解技术的流程,这样的事情相信很多人都经历过。 本书通过实际实现(落实到代码)来理解深度学习,是一本强调“工程” 的书。书中会出现很多数学式,但同时也会有很多程序员视角的源代码。 本书面向的读者 本书旨在让读者通过实际动手操作来深入理解深度学习。为了明确本书 的读者对象,这里将本书涉及的内容列举如下。 • 使用Python,尽可能少地使用外部库,从零开始实现深度学习的程序。 • 为了让Python的初学者也能理解,介绍Python的使用方法。 • 提供实际可运行的Python源代码,同时提供可以让读者亲自实验的 学习环境。 • 从简单的机器学习问题开始,最终实现一个能高精度地识别图像的系统。 • 以简明易懂的方式讲解深度学习和神经网络的理论。 • 对于误差反向传播法、卷积运算等乍一看很复杂的技术,使读者能够 在实现层面上理解。 • 介绍一些学习深度学习时有用的实践技巧,如确定学习率的方法、权 重的初始值等。 • 介绍最近流行的Batch Normalization、Dropout、Adam等,并进行 实现。 • 讨论为什么深度学习表现优异、为什么加深层能提高识别精度、为什 么隐藏层很重要等问题。 • 介绍自动驾驶、图像生成、强化学习等深度学习的应用案例。
The above is a preview of the first 20 pages. Register to read the complete e-book.