不见春山
骑马倚斜桥,满楼红袖招。
Home
Categories
Archives
Tags
About
Home
残差网络(ResNet)
残差网络(ResNet)
取消
残差网络(ResNet)
由
ctaoist
发布于 2021-07-09
·
最后更新:2022-03-04
1
## 简介 **ResNets要解决的是深度神经网络的“退化”问题。** 什么是“退化”? 我们知道,对浅层网络逐渐叠加layers,模型在训练集和测试集上的性能会变好,因为模型复杂度更高了,表达能力更强了,可以对潜在的映射关系拟合得更好。而“退化”指的是,给网络叠加更多的层后,性能却快速下降的情况。 如果期望的潜在映射为$H(x)$,与其让$F(x)$直接学习潜在的映射,不如去学习残差$H(x)−x$,即$F(x):=H(x)−x$,这样原本的前向路径上就变成了$F(x)+x$,用$F(x)+x$来拟合$H(x)$。 ![ResNet残差块](https://blog.qiniu.ctaoist.cn/ResNet_残差块.png) 多个相似的Residual Block串联构成ResNet。 ## Residual Block的分析与改进 论文[Identity Mappings in Deep Residual Networks](https://arxiv.org/abs/1603.05027) 进一步研究ResNet,通过ResNet反向传播的理论分析以及调整Residual Block的结构,得到了新的结构,如下 ![ResNet_残差块_改进](http://blog.qiniu.ctaoist.cn/ResNet_残差块_改进.png) **注意,这里的视角与之前不同,这里将shortcut路径视为主干路径,将残差路径视为旁路。** 新提出的Residual Block结构,具有更强的泛化能力,能更好地避免“退化”,堆叠大于1000层后,性能仍在变好。具体的变化在于 - **通过保持shortcut路径的“纯净”,可以让信息在前向传播和反向传播中平滑传递,这点十分重要**。为此,如无必要,不引入1×1卷积等操作,同时将上图灰色路径上的ReLU移到了F(x)路径上。 - 在残差路径上,**将BN和ReLU统一放在weight前作为pre-activation**,获得了“Ease of optimization”以及“Reducing overfitting”的效果。 ResNet可以让冗余的block学习成恒等映射,性能也不会下降。所以,网络的“实际深度”是在训练过程中决定的,即ResNet具有某种**深度自适应**的能力。 ## 几种 ResNet 结构 两种残差单元,左边的是 BasicBlock,右边的是 BottleNeck: ![](http://blog.qiniu.ctaoist.cn/残差网络_不同的残差单元.png) BottleNeck中1x1卷积作用: - 对通道数进行升维和降维(跨通道信息整合),实现了多个特征图的线性组合,同时保持了原有的特征图大小 - 相比于其他尺寸的卷积核,可以极大地降低运算复杂度 - 如果使用两个3x3卷积堆叠,只有一个relu,但使用1x1卷积就会有两个relu,引入了更多的非线性映射 我们来计算一下1*1卷积的计算量优势:首先看上图右边的bottleneck结构,对于256维的输入特征,参数数目:1x1x256x64+3x3x64x64+1x1x64x256=69632,如果同样的输入输出维度但不使用1x1卷积,而使用两个3x3卷积的话,参数数目为(3x3x256x256)x2=1179648。简单计算下就知道了,使用了1x1卷积的bottleneck将计算量简化为原有的5.9%,收益超高。 几种不同的ResNet由不同的残差单元堆叠而成: ![](http://blog.qiniu.ctaoist.cn/几种resnet结构.png) ## Tensorflow2 实现 Tensorflow 的 Keras 高级API中定义了 50,101和152层的 ResNet 和 ResNeXt: ```py from tensorflow.keras.applications.resnet import ResNet50 ``` ### ResNet 实现 BasicBlock 残差单元实现,此实现的是残差单元的原始版本: ```py class BasicBlock(tf.keras.layers.Layer): def __init__(self, filter_num, stride=1): super(BasicBlock, self).__init__() self.conv1 = tf.keras.layers.Conv2D(filters=filter_num,kernel_size=(3, 3),strides=stride,padding="same") self.bn1 = tf.keras.layers.BatchNormalization() self.relu = tf.keras.layers.Activation(tf.keras.activations.relu) self.conv2 = tf.keras.layers.Conv2D(filters=filter_num,kernel_size=(3, 3), strides=1, padding="same") self.bn2 = tf.keras.layers.BatchNormalization() if stride != 1: self.downsample = tf.keras.Sequential() self.downsample.add(tf.keras.layers.Conv2D(filters=filter_num,kernel_size=(1, 1), strides=stride)) self.downsample.add(tf.keras.layers.BatchNormalization()) else: self.downsample = lambda x: x def call(self, inputs, training=None): identity = self.downsample(inputs) conv1 = self.conv1(inputs) bn1 = self.bn1(conv1) relu = self.relu(bn1) conv2 = self.conv2(relu) bn2 = self.bn2(conv2) output = tf.nn.relu(tf.keras.layers.add([identity, bn2])) return output ``` BottleNeck 残差单元实现: ```py class BottleNeck(tf.keras.layers.Layer): def __init__(self, filter_num, stride=1): super(BottleNeck, self).__init__() self.conv1 = tf.keras.layers.Conv2D(filters=filter_num,kernel_size=(1, 1), strides=1, padding='same') self.bn1 = tf.keras.layers.BatchNormalization() self.conv2 = tf.keras.layers.Conv2D(filters=filter_num,kernel_size=(3, 3), strides=stride, padding='same') self.bn2 = tf.keras.layers.BatchNormalization() self.conv3 = tf.keras.layers.Conv2D(filters=filter_num * 4,kernel_size=(1, 1), strides=1, padding='same') self.bn3 = tf.keras.layers.BatchNormalization() self.downsample = tf.keras.Sequential() self.downsample.add(tf.keras.layers.Conv2D(filters=filter_num * 4,kernel_size=(1, 1), strides=stride)) self.downsample.add(tf.keras.layers.BatchNormalization()) def call(self, inputs, training=None): identity = self.downsample(inputs) conv1 = self.conv1(inputs) bn1 = self.bn1(conv1) relu1 = tf.nn.relu(bn1) conv2 = self.conv2(relu1) bn2 = self.bn2(conv2) relu2 = tf.nn.relu(bn2) conv3 = self.conv3(relu2) bn3 = self.bn3(conv3) output = tf.nn.relu(tf.keras.layers.add([identity, bn3])) return output ``` 堆叠模型的函数: ```py def make_basic_block_layer(filter_num, blocks, stride=1): res_block = tf.keras.Sequential() res_block.add(BasicBlock(filter_num, stride=stride)) for _ in range(1, blocks): res_block.add(BasicBlock(filter_num, stride=1)) return res_block def make_bottleneck_layer(filter_num, blocks, stride=1): res_block = tf.keras.Sequential() res_block.add(BottleNeck(filter_num, stride=stride)) for _ in range(1, blocks): res_block.add(BottleNeck(filter_num, stride=1)) return res_block ``` 生成模型: ```py class ResNetTypeI(tf.keras.Model): def __init__(self, layer_params): super(ResNetTypeI, self).__init__() self.conv1 = tf.keras.layers.Conv2D(filters=64, kernel_size=(7, 7), strides=2, padding="same") self.bn1 = tf.keras.layers.BatchNormalization() self.pool1 = tf.keras.layers.MaxPool2D(pool_size=(3, 3), strides=2, padding="same") self.layer1 = make_basic_block_layer(filter_num=64, blocks=layer_params[0]) self.layer2 = make_basic_block_layer(filter_num=128, blocks=layer_params[1], stride=2) self.layer3 = make_basic_block_layer(filter_num=256, blocks=layer_params[2], stride=2) self.layer4 = make_basic_block_layer(filter_num=512, blocks=layer_params[3], stride=2) self.avgpool = tf.keras.layers.GlobalAveragePooling2D() self.fc = tf.keras.layers.Dense(units=NUM_CLASSES, activation=tf.keras.activations.softmax) def call(self, inputs, training=None, mask=None): x = self.conv1(inputs) x = self.bn1(x, training=training) x = tf.nn.relu(x) x = self.pool1(x) x = self.layer1(x, training=training) x = self.layer2(x, training=training) x = self.layer3(x, training=training) x = self.layer4(x, training=training) x = self.avgpool(x) output = self.fc(x) return output class ResNetTypeII(tf.keras.Model): def __init__(self, layer_params): super(ResNetTypeII, self).__init__() self.conv1 = tf.keras.layers.Conv2D(filters=64, kernel_size=(7, 7),strides=2,padding="same") self.bn1 = tf.keras.layers.BatchNormalization() self.pool1 = tf.keras.layers.MaxPool2D(pool_size=(3, 3), strides=2,padding="same") self.layer1 = make_bottleneck_layer(filter_num=64,blocks=layer_params[0]) self.layer2 = make_bottleneck_layer(filter_num=128,blocks=layer_params[1],stride=2) self.layer3 = make_bottleneck_layer(filter_num=256,blocks=layer_params[2],stride=2) self.layer4 = make_bottleneck_layer(filter_num=512,blocks=layer_params[3],stride=2) self.avgpool = tf.keras.layers.GlobalAveragePooling2D() self.fc = tf.keras.layers.Dense(units=NUM_CLASSES, activation=tf.keras.activations.softmax) def call(self, inputs, training=None, mask=None): x = self.conv1(inputs) x = self.bn1(x, training=training) x = tf.nn.relu(x) x = self.pool1(x) x = self.layer1(x, training=training) x = self.layer2(x, training=training) x = self.layer3(x, training=training) x = self.layer4(x, training=training) x = self.avgpool(x) output = self.fc(x) return output def resnet_18(): return ResNetTypeI(layer_params=[2, 2, 2, 2]) def resnet_34(): return ResNetTypeI(layer_params=[3, 4, 6, 3]) def resnet_50(): return ResNetTypeII(layer_params=[3, 4, 6, 3]) def resnet_101(): return ResNetTypeII(layer_params=[3, 4, 23, 3]) def resnet_152(): return ResNetTypeII(layer_params=[3, 8, 36, 3]) ``` ### ResNeXt 实现 ResNeXt 是 ResNet 的一种改进,具有如下优点: - 网络结构简明,模块化 - 需要手动调节的超参少 - 与 ResNet 相比,相同的参数个数,结果更好:一个 101 层的 ResNeXt 网络,和 200 层的 ResNet 准确度差不多,但是计算量只有后者的一半 ResNet 和 ResNeXt 对比: ![](http://blog.qiniu.ctaoist.cn/resnext-resnet残差块对比.png) 上图中左边是 ResNet 的残差块,右边是 ResNeXt 的残差块,是分组卷积,左右两边参数个数相同。 ![](http://blog.qiniu.ctaoist.cn/ResNeXt_残差块等价模型.png) (a)、(b)、(c)三个网络结构都是等价的,论文中使用的是(c)结构。结构(c)中使用了groupd convolutions ## 参考 - [ResNet及其变种的结构梳理、有效性分析与代码解读](https://zhuanlan.zhihu.com/p/54289848) - [TensorFlow2.0实现ResNeXt](https://zhuanlan.zhihu.com/p/85627290)
机器学习
该博客文章由作者通过
CC BY 4.0
进行授权。
分享
最近更新
群晖升级 ARPL 笔记
本地部署大语言模型
LVM 管理
HK1 RBOX X4 电视盒子折腾笔记
使用usbip网络转发usb设备到远程主机
热门标签
机器学习
Linux
Router
ROS
Tensorflow
VPN
虚拟组网
ARM
Latex
zerotier
文章目录
Tensorflow 调用 Matlab 生成的 .mat 文件
RNN与LSTM