不见春山
骑马倚斜桥,满楼红袖招。
Home
Categories
Archives
Tags
About
Home
残差网络(ResNet)
残差网络(ResNet)
取消
残差网络(ResNet)
由
ctaoist
发布于 2021-07-09
·
最后更新:2022-03-04
1
## 简介 **ResNets要解决的是深度神经网络的“退化”问题。** 什么是“退化”? 我们知道,对浅层网络逐渐叠加layers,模型在训练集和测试集上的性能会变好,因为模型复杂度更高了,表达能力更强了,可以对潜在的映射关系拟合得更好。而“退化”指的是,给网络叠加更多的层后,性能却快速下降的情况。 如果期望的潜在映射为$H(x)$,与其让$F(x)$直接学习潜在的映射,不如去学习残差$H(x)−x$,即$F(x):=H(x)−x$,这样原本的前向路径上就变成了$F(x)+x$,用$F(x)+x$来拟合$H(x)$。 ![ResNet残差块](https://blog.qiniu.ctaoist.cn/ResNet_残差块.png) 多个相似的Residual Block串联构成ResNet。 ## Residual Block的分析与改进 论文[Identity Mappings in Deep Residual Networks](https://arxiv.org/abs/1603.05027) 进一步研究ResNet,通过ResNet反向传播的理论分析以及调整Residual Block的结构,得到了新的结构,如下 ![ResNet_残差块_改进](http://blog.qiniu.ctaoist.cn/ResNet_残差块_改进.png) **注意,这里的视角与之前不同,这里将shortcut路径视为主干路径,将残差路径视为旁路。** 新提出的Residual Block结构,具有更强的泛化能力,能更好地避免“退化”,堆叠大于1000层后,性能仍在变好。具体的变化在于 - **通过保持shortcut路径的“纯净”,可以让信息在前向传播和反向传播中平滑传递,这点十分重要**。为此,如无必要,不引入1×1卷积等操作,同时将上图灰色路径上的ReLU移到了F(x)路径上。 - 在残差路径上,**将BN和ReLU统一放在weight前作为pre-activation**,获得了“Ease of optimization”以及“Reducing overfitting”的效果。 ResNet可以让冗余的block学习成恒等映射,性能也不会下降。所以,网络的“实际深度”是在训练过程中决定的,即ResNet具有某种**深度自适应**的能力。 ## 几种 ResNet 结构 两种残差单元,左边的是 BasicBlock,右边的是 BottleNeck: ![](http://blog.qiniu.ctaoist.cn/残差网络_不同的残差单元.png) BottleNeck中1x1卷积作用: - 对通道数进行升维和降维(跨通道信息整合),实现了多个特征图的线性组合,同时保持了原有的特征图大小 - 相比于其他尺寸的卷积核,可以极大地降低运算复杂度 - 如果使用两个3x3卷积堆叠,只有一个relu,但使用1x1卷积就会有两个relu,引入了更多的非线性映射 我们来计算一下1*1卷积的计算量优势:首先看上图右边的bottleneck结构,对于256维的输入特征,参数数目:1x1x256x64+3x3x64x64+1x1x64x256=69632,如果同样的输入输出维度但不使用1x1卷积,而使用两个3x3卷积的话,参数数目为(3x3x256x256)x2=1179648。简单计算下就知道了,使用了1x1卷积的bottleneck将计算量简化为原有的5.9%,收益超高。 几种不同的ResNet由不同的残差单元堆叠而成: ![](http://blog.qiniu.ctaoist.cn/几种resnet结构.png) ## Tensorflow2 实现 Tensorflow 的 Keras 高级API中定义了 50,101和152层的 ResNet 和 ResNeXt: ```py from tensorflow.keras.applications.resnet import ResNet50 ``` ### ResNet 实现 BasicBlock 残差单元实现,此实现的是残差单元的原始版本: ```py class BasicBlock(tf.keras.layers.Layer): def __init__(self, filter_num, stride=1): super(BasicBlock, self).__init__() self.conv1 = tf.keras.layers.Conv2D(filters=filter_num,kernel_size=(3, 3),strides=stride,padding="same") self.bn1 = tf.keras.layers.BatchNormalization() self.relu = tf.keras.layers.Activation(tf.keras.activations.relu) self.conv2 = tf.keras.layers.Conv2D(filters=filter_num,kernel_size=(3, 3), strides=1, padding="same") self.bn2 = tf.keras.layers.BatchNormalization() if stride != 1: self.downsample = tf.keras.Sequential() self.downsample.add(tf.keras.layers.Conv2D(filters=filter_num,kernel_size=(1, 1), strides=stride)) self.downsample.add(tf.keras.layers.BatchNormalization()) else: self.downsample = lambda x: x def call(self, inputs, training=None): identity = self.downsample(inputs) conv1 = self.conv1(inputs) bn1 = self.bn1(conv1) relu = self.relu(bn1) conv2 = self.conv2(relu) bn2 = self.bn2(conv2) output = tf.nn.relu(tf.keras.layers.add([identity, bn2])) return output ``` BottleNeck 残差单元实现: ```py class BottleNeck(tf.keras.layers.Layer): def __init__(self, filter_num, stride=1): super(BottleNeck, self).__init__() self.conv1 = tf.keras.layers.Conv2D(filters=filter_num,kernel_size=(1, 1), strides=1, padding='same') self.bn1 = tf.keras.layers.BatchNormalization() self.conv2 = tf.keras.layers.Conv2D(filters=filter_num,kernel_size=(3, 3), strides=stride, padding='same') self.bn2 = tf.keras.layers.BatchNormalization() self.conv3 = tf.keras.layers.Conv2D(filters=filter_num * 4,kernel_size=(1, 1), strides=1, padding='same') self.bn3 = tf.keras.layers.BatchNormalization() self.downsample = tf.keras.Sequential() self.downsample.add(tf.keras.layers.Conv2D(filters=filter_num * 4,kernel_size=(1, 1), strides=stride)) self.downsample.add(tf.keras.layers.BatchNormalization()) def call(self, inputs, training=None): identity = self.downsample(inputs) conv1 = self.conv1(inputs) bn1 = self.bn1(conv1) relu1 = tf.nn.relu(bn1) conv2 = self.conv2(relu1) bn2 = self.bn2(conv2) relu2 = tf.nn.relu(bn2) conv3 = self.conv3(relu2) bn3 = self.bn3(conv3) output = tf.nn.relu(tf.keras.layers.add([identity, bn3])) return output ``` 堆叠模型的函数: ```py def make_basic_block_layer(filter_num, blocks, stride=1): res_block = tf.keras.Sequential() res_block.add(BasicBlock(filter_num, stride=stride)) for _ in range(1, blocks): res_block.add(BasicBlock(filter_num, stride=1)) return res_block def make_bottleneck_layer(filter_num, blocks, stride=1): res_block = tf.keras.Sequential() res_block.add(BottleNeck(filter_num, stride=stride)) for _ in range(1, blocks): res_block.add(BottleNeck(filter_num, stride=1)) return res_block ``` 生成模型: ```py class ResNetTypeI(tf.keras.Model): def __init__(self, layer_params): super(ResNetTypeI, self).__init__() self.conv1 = tf.keras.layers.Conv2D(filters=64, kernel_size=(7, 7), strides=2, padding="same") self.bn1 = tf.keras.layers.BatchNormalization() self.pool1 = tf.keras.layers.MaxPool2D(pool_size=(3, 3), strides=2, padding="same") self.layer1 = make_basic_block_layer(filter_num=64, blocks=layer_params[0]) self.layer2 = make_basic_block_layer(filter_num=128, blocks=layer_params[1], stride=2) self.layer3 = make_basic_block_layer(filter_num=256, blocks=layer_params[2], stride=2) self.layer4 = make_basic_block_layer(filter_num=512, blocks=layer_params[3], stride=2) self.avgpool = tf.keras.layers.GlobalAveragePooling2D() self.fc = tf.keras.layers.Dense(units=NUM_CLASSES, activation=tf.keras.activations.softmax) def call(self, inputs, training=None, mask=None): x = self.conv1(inputs) x = self.bn1(x, training=training) x = tf.nn.relu(x) x = self.pool1(x) x = self.layer1(x, training=training) x = self.layer2(x, training=training) x = self.layer3(x, training=training) x = self.layer4(x, training=training) x = self.avgpool(x) output = self.fc(x) return output class ResNetTypeII(tf.keras.Model): def __init__(self, layer_params): super(ResNetTypeII, self).__init__() self.conv1 = tf.keras.layers.Conv2D(filters=64, kernel_size=(7, 7),strides=2,padding="same") self.bn1 = tf.keras.layers.BatchNormalization() self.pool1 = tf.keras.layers.MaxPool2D(pool_size=(3, 3), strides=2,padding="same") self.layer1 = make_bottleneck_layer(filter_num=64,blocks=layer_params[0]) self.layer2 = make_bottleneck_layer(filter_num=128,blocks=layer_params[1],stride=2) self.layer3 = make_bottleneck_layer(filter_num=256,blocks=layer_params[2],stride=2) self.layer4 = make_bottleneck_layer(filter_num=512,blocks=layer_params[3],stride=2) self.avgpool = tf.keras.layers.GlobalAveragePooling2D() self.fc = tf.keras.layers.Dense(units=NUM_CLASSES, activation=tf.keras.activations.softmax) def call(self, inputs, training=None, mask=None): x = self.conv1(inputs) x = self.bn1(x, training=training) x = tf.nn.relu(x) x = self.pool1(x) x = self.layer1(x, training=training) x = self.layer2(x, training=training) x = self.layer3(x, training=training) x = self.layer4(x, training=training) x = self.avgpool(x) output = self.fc(x) return output def resnet_18(): return ResNetTypeI(layer_params=[2, 2, 2, 2]) def resnet_34(): return ResNetTypeI(layer_params=[3, 4, 6, 3]) def resnet_50(): return ResNetTypeII(layer_params=[3, 4, 6, 3]) def resnet_101(): return ResNetTypeII(layer_params=[3, 4, 23, 3]) def resnet_152(): return ResNetTypeII(layer_params=[3, 8, 36, 3]) ``` ### ResNeXt 实现 ResNeXt 是 ResNet 的一种改进,具有如下优点: - 网络结构简明,模块化 - 需要手动调节的超参少 - 与 ResNet 相比,相同的参数个数,结果更好:一个 101 层的 ResNeXt 网络,和 200 层的 ResNet 准确度差不多,但是计算量只有后者的一半 ResNet 和 ResNeXt 对比: ![](http://blog.qiniu.ctaoist.cn/resnext-resnet残差块对比.png) 上图中左边是 ResNet 的残差块,右边是 ResNeXt 的残差块,是分组卷积,左右两边参数个数相同。 ![](http://blog.qiniu.ctaoist.cn/ResNeXt_残差块等价模型.png) (a)、(b)、(c)三个网络结构都是等价的,论文中使用的是(c)结构。结构(c)中使用了groupd convolutions ## 参考 - [ResNet及其变种的结构梳理、有效性分析与代码解读](https://zhuanlan.zhihu.com/p/54289848) - [TensorFlow2.0实现ResNeXt](https://zhuanlan.zhihu.com/p/85627290)
机器学习
该博客文章由作者通过
CC BY 4.0
进行授权。
分享
最近更新
群晖升级 ARPL 笔记
本地部署大语言模型
WireGuard 搭建组网教程
LVM 管理
HK1 RBOX X4 电视盒子折腾笔记
热门标签
机器学习
Tensorflow
Linux
VPN
虚拟组网
Router
ROS
嵌入式
C++
C
文章目录
Tensorflow 调用 Matlab 生成的 .mat 文件
RNN与LSTM