GoogleNet

GoogLeNet(又称 Inception v1) 是 Google 团队在2014 年提出的深度卷积神经网络,论文《Going Deeper with Convolutions》

  • 比赛成绩:ILSVRC 2014图像分类冠军,top-5错误率6.67%(无外部数据)

  • 网络深度:22层(参数约为AlexNet的1/12、VGG的1/3)

  • 核心亮点:Inception模块用 “并行多尺度卷积 + 1×1 降维” 提升效率、控制计算量

其网络结构如下:

创新与突破

在GoogLeNet之前(AlexNet、VGG),主流思路是堆深度,导致:

  1. 计算量爆炸:参数过多,训练 / 推理慢
  2. 梯度消失:深层网络难以训练
  3. 过拟合:模型太复杂

GoogLeNet思路:“宽” 而非 “深”—— 同一层并行提取多尺度特征,在相同计算量下获得更强表达能力

Inception 模块(并行多尺度 + 降维)

分支 1:1×1 卷积:直接跨通道融合、降维

分支 2:1×1→3×3 卷积:先降维再卷积,减少计算

分支 3:1×1→5×5 卷积:捕捉更大范围特征

分支 4:3×3 最大池化→1×1 卷积:池化后降维,保留空间信息

输出:4 条分支在channel维度拼接,形成多尺度特征图

优点

  • 高效轻量:参数少、计算量低,适合部署
  • 多尺度特征:并行捕捉细节与全局信息
  • 泛化性好:GAP+Dropout + 辅助分类器,正则化强

缺点

  • 结构复杂:分支多、设计 / 调参难度大
  • 深层梯度仍易消失:不如后续 ResNet 的残差连接
  • 训练技巧多:需精细调学习率、权重衰减等

Inception模块构建

您需要使用shortcut_layer和inception_layer来构建Inception模块,如下所示

l1 = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

l2 = make_shortcut_layer(l1, 1, "linear");
l3 = make_convolutional_layer(96, 1, 1, 0, 1, "relu");
l3 = make_convolutional_layer(128, 3, 1, 1, 1, "relu");

l5 = make_shortcut_layer(l1, 1, "linear");
l6 = make_convolutional_layer(16, 1, 1, 0, 1, "relu");
l7 = make_convolutional_layer(32, 5, 1, 2, 1, "relu");

l8 = make_shortcut_layer(l1, 1, "linear");
l9 = make_maxpool_layer(3, 1, 1);
l10 = make_convolutional_layer(32, 1, 1, 0, 1, "relu");

Layer **inception3a = malloc(4*sizeof(Layer*));
l11 = make_inception_layer(inception3a, 4, 2);
inception3a[0] = ls[2];
inception3a[1] = ls[5];
inception3a[2] = ls[8];
inception3a[3] = ls[11];

shortcut_layer模式一直接获取链接层的输入,inception_layer实现多层计算结果的合并

Lumos框架中数据按照[width:0,height:1,channel:2,batch:3]组织,所以inception_layer按照channel进行合并时,dim参数设置为2

CIFAR100数据集

CIFAR-100(Canadian Institute for Advanced Research)是2009年由 Alex Krizhevsky 等人发布的32×32彩色小图分类基准,是CIFAR-10的扩展,类别更多、粒度更细、难度更高

基本信息

  • 总图像数:60,000 张(RGB 彩色,32×32×3uint8
  • 训练集:50,000 张(100 类,每类 500 张)
  • 测试集:10,000 张(100 类,每类 100 张)
  • 类别数:100 个细类(fine),归为20个超类(coarse),每超类含5个子类

下载地址:CIFAR-10 and CIFAR-100 datasets

代码构建

使用Lumos框架构建网络模型,该实例中我们省略了辅助分类器

Graph *graph = create_graph();
Layer **ls = malloc(112*sizeof(Layer*));
ls[0] = make_convolutional_layer(64, 7, 2, 3, 0, "linear");
ls[1] = make_normalization_layer(0.1, 1, "relu");
ls[2] = make_maxpool_layer(3, 2, 1);
ls[3] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");
ls[4] = make_convolutional_layer(192, 3, 1, 1, 0, "linear");
ls[5] = make_normalization_layer(0.1, 1, "relu");
ls[6] = make_maxpool_layer(3, 2, 1);

ls[7] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

ls[8] = make_shortcut_layer(ls[7], 1, "linear");
ls[9] = make_convolutional_layer(96, 1, 1, 0, 1, "relu");
ls[10] = make_convolutional_layer(128, 3, 1, 1, 1, "relu");

ls[11] = make_shortcut_layer(ls[7], 1, "linear");
ls[12] = make_convolutional_layer(16, 1, 1, 0, 1, "relu");
ls[13] = make_convolutional_layer(32, 5, 1, 2, 1, "relu");

ls[14] = make_shortcut_layer(ls[7], 1, "linear");
ls[15] = make_maxpool_layer(3, 1, 1);
ls[16] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");

Layer **inception3a = malloc(4*sizeof(Layer*));
ls[17] = make_inception_layer(inception3a, 4, 2);
inception3a[0] = ls[8];
inception3a[1] = ls[11];
inception3a[2] = ls[14];
inception3a[3] = ls[17];

ls[18] = make_convolutional_layer(128, 1, 1, 0, 1, "relu"); // 16

ls[19] = make_shortcut_layer(ls[18], 1, "linear");
ls[20] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");
ls[21] = make_convolutional_layer(192, 3, 1, 1, 1, "relu");

ls[22] = make_shortcut_layer(ls[18], 1, "linear");
ls[23] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");
ls[24] = make_convolutional_layer(96, 5, 1, 2, 1, "relu");

ls[25] = make_shortcut_layer(ls[18], 1, "linear");
ls[26] = make_maxpool_layer(3, 1, 1);
ls[27] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

Layer **inception3b = malloc(4*sizeof(Layer*));
ls[28] = make_inception_layer(inception3b, 4, 2);
inception3b[0] = ls[19];
inception3b[1] = ls[22];
inception3b[2] = ls[25];
inception3b[3] = ls[28];

ls[29] = make_maxpool_layer(3, 2, 1); // 27

ls[30] = make_convolutional_layer(192, 1, 1, 0, 1, "relu");

ls[31] = make_shortcut_layer(ls[30], 1, "linear");
ls[32] = make_convolutional_layer(96, 1, 1, 0, 1, "relu");
ls[33] = make_convolutional_layer(208, 3, 1, 1, 1, "relu");

ls[34] = make_shortcut_layer(ls[30], 1, "linear");
ls[35] = make_convolutional_layer(16, 1, 1, 0, 1, "relu");
ls[36] = make_convolutional_layer(48, 5, 1, 2, 1, "relu");

ls[37] = make_shortcut_layer(ls[30], 1, "linear");
ls[38] = make_maxpool_layer(3, 1, 1);
ls[39] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

Layer **inception4a = malloc(4*sizeof(Layer*));
ls[40] = make_inception_layer(inception4a, 4, 2);
inception4a[0] = ls[31];
inception4a[1] = ls[34];
inception4a[2] = ls[37];
inception4a[3] = ls[40];

// 辅助分类器位置
ls[41] = make_convolutional_layer(160, 1, 1, 0, 1, "relu"); // 39

ls[42] = make_shortcut_layer(ls[41], 1, "linear");
ls[43] = make_convolutional_layer(112, 1, 1, 0, 1, "relu");
ls[44] = make_convolutional_layer(224, 3, 1, 1, 1, "relu");

ls[45] = make_shortcut_layer(ls[41], 1, "linear");
ls[46] = make_convolutional_layer(24, 1, 1, 0, 1, "relu");
ls[47] = make_convolutional_layer(64, 5, 1, 2, 1, "relu");

ls[48] = make_shortcut_layer(ls[41], 1, "linear");
ls[49] = make_maxpool_layer(3, 1, 1);
ls[50] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

Layer **inception4b = malloc(4*sizeof(Layer*));
ls[51] = make_inception_layer(inception4b, 4, 2);
inception4b[0] = ls[42];
inception4b[1] = ls[45];
inception4b[2] = ls[48];
inception4b[3] = ls[51];

ls[52] = make_convolutional_layer(128, 1, 1, 0, 1, "relu"); // 50

ls[53] = make_shortcut_layer(ls[52], 1, "linear");
ls[54] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");
ls[55] = make_convolutional_layer(256, 3, 1, 1, 1, "relu");

ls[56] = make_shortcut_layer(ls[52], 1, "linear");
ls[57] = make_convolutional_layer(24, 1, 1, 0, 1, "relu");
ls[58] = make_convolutional_layer(64, 5, 1, 2, 1, "relu");

ls[59] = make_shortcut_layer(ls[52], 1, "linear");
ls[60] = make_maxpool_layer(3, 1, 1);
ls[61] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

Layer **inception4c = malloc(4*sizeof(Layer*));
ls[62] = make_inception_layer(inception4c, 4, 2);
inception4c[0] = ls[53];
inception4c[1] = ls[56];
inception4c[2] = ls[59];
inception4c[3] = ls[62];

ls[63] = make_convolutional_layer(112, 1, 1, 0, 1, "relu"); // 61

ls[64] = make_shortcut_layer(ls[63], 1, "linear");
ls[65] = make_convolutional_layer(144, 1, 1, 0, 1, "relu");
ls[66] = make_convolutional_layer(288, 3, 1, 1, 1, "relu");

ls[67] = make_shortcut_layer(ls[63], 1, "linear");
ls[68] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");
ls[69] = make_convolutional_layer(64, 5, 1, 2, 1, "relu");

ls[70] = make_shortcut_layer(ls[63], 1, "linear");
ls[71] = make_maxpool_layer(3, 1, 1);
ls[72] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

Layer **inception4d = malloc(4*sizeof(Layer*));
ls[73] = make_inception_layer(inception4d, 4, 2);
inception4d[0] = ls[64];
inception4d[1] = ls[67];
inception4d[2] = ls[70];
inception4d[3] = ls[73];

// 辅助分类器位置
ls[74] = make_convolutional_layer(256, 1, 1, 0, 1, "relu"); // 72

ls[75] = make_shortcut_layer(ls[74], 1, "linear");
ls[76] = make_convolutional_layer(160, 1, 1, 0, 1, "relu");
ls[77] = make_convolutional_layer(320, 3, 1, 1, 1, "relu");

ls[78] = make_shortcut_layer(ls[74], 1, "linear");
ls[79] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");
ls[80] = make_convolutional_layer(128, 5, 1, 2, 1, "relu");

ls[81] = make_shortcut_layer(ls[74], 1, "linear");
ls[82] = make_maxpool_layer(3, 1, 1);
ls[83] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");

Layer **inception4e = malloc(4*sizeof(Layer*));
ls[84] = make_inception_layer(inception4e, 4, 2);
inception4e[0] = ls[75];
inception4e[1] = ls[78];
inception4e[2] = ls[81];
inception4e[3] = ls[84];

ls[85] = make_maxpool_layer(2, 2, 0); // 83 25

ls[86] = make_convolutional_layer(256, 1, 1, 0, 1, "relu");

ls[87] = make_shortcut_layer(ls[86], 1, "linear");
ls[88] = make_convolutional_layer(160, 1, 1, 0, 1, "relu");
ls[89] = make_convolutional_layer(320, 3, 1, 1, 1, "relu");

ls[90] = make_shortcut_layer(ls[86], 1, "linear");
ls[91] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");
ls[92] = make_convolutional_layer(128, 5, 1, 2, 1, "relu");

ls[93] = make_shortcut_layer(ls[86], 1, "linear");
ls[94] = make_maxpool_layer(3, 1, 1);
ls[95] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");

Layer **inception5a = malloc(4*sizeof(Layer*));
ls[96] = make_inception_layer(inception5a, 4, 2);
inception5a[0] = ls[87];
inception5a[1] = ls[90];
inception5a[2] = ls[93];
inception5a[3] = ls[96];

ls[97] = make_convolutional_layer(384, 1, 1, 0, 1, "relu"); // 95

ls[98] = make_shortcut_layer(ls[97], 1, "linear");
ls[99] = make_convolutional_layer(192, 1, 1, 0, 1, "relu");
ls[100] = make_convolutional_layer(384, 3, 1, 1, 1, "relu");

ls[101] = make_shortcut_layer(ls[97], 1, "linear");
ls[102] = make_convolutional_layer(48, 1, 1, 0, 1, "relu");
ls[103] = make_convolutional_layer(128, 5, 1, 2, 1, "relu");

ls[104] = make_shortcut_layer(ls[97], 1, "linear");
ls[105] = make_maxpool_layer(3, 1, 1);
ls[106] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");

Layer **inception5b = malloc(4*sizeof(Layer*));
ls[107] = make_inception_layer(inception5b, 4, 2);
inception5b[0] = ls[98];
inception5b[1] = ls[101];
inception5b[2] = ls[104];
inception5b[3] = ls[107];

ls[108] = make_avgpool_layer(7, 7, 0); // 106
ls[109] = make_dropout_layer(0.5);
ls[110] = make_connect_layer(100, 1, "linear");
ls[111] = make_crossentropy_layer(NULL, -1);

我们使用crossentropy分类器进行分类

接下来构建会话,并设置相关训练超参数

Session *sess = create_session(graph, 96, 96, 3, 100, type, path);
float *mean = calloc(3, sizeof(float));
float *std = calloc(3, sizeof(float));
mean[0] = 0.485;
mean[1] = 0.456;
mean[2] = 0.406;
std[0] = 0.229;
std[1] = 0.224;
std[2] = 0.225;
transform_normalize_sess(sess, mean, std);
transform_resize_sess(sess, 96, 96);
set_train_params(sess, 150, 64, 64, 0.0001);
SGDOptimizer_sess(sess, 0.9, 0, 0, 0, 0);
init_session(sess, "./data/cifar100/train.txt", "./data/cifar100/train_label.txt");

可以看到我们对数据集进行了一定的预处理操作,首先对数据集进行归一化,归一化的分布来自于ImageNet数据集的先验计算结果,后续我们对数据集进行缩放,使其符合网络模型输入

我们使用SGD参数优化器进行参数优化

完整代码如下

#include "googlenet.h"

void googlenet(char *type, char *path)
{
    Graph *graph = create_graph();
    Layer **ls = malloc(112*sizeof(Layer*));
    ls[0] = make_convolutional_layer(64, 7, 2, 3, 0, "linear");
    ls[1] = make_normalization_layer(0.1, 1, "relu");
    ls[2] = make_maxpool_layer(3, 2, 1);
    ls[3] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");
    ls[4] = make_convolutional_layer(192, 3, 1, 1, 0, "linear");
    ls[5] = make_normalization_layer(0.1, 1, "relu");
    ls[6] = make_maxpool_layer(3, 2, 1);

    ls[7] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

    ls[8] = make_shortcut_layer(ls[7], 1, "linear");
    ls[9] = make_convolutional_layer(96, 1, 1, 0, 1, "relu");
    ls[10] = make_convolutional_layer(128, 3, 1, 1, 1, "relu");

    ls[11] = make_shortcut_layer(ls[7], 1, "linear");
    ls[12] = make_convolutional_layer(16, 1, 1, 0, 1, "relu");
    ls[13] = make_convolutional_layer(32, 5, 1, 2, 1, "relu");

    ls[14] = make_shortcut_layer(ls[7], 1, "linear");
    ls[15] = make_maxpool_layer(3, 1, 1);
    ls[16] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");

    Layer **inception3a = malloc(4*sizeof(Layer*));
    ls[17] = make_inception_layer(inception3a, 4, 2);
    inception3a[0] = ls[8];
    inception3a[1] = ls[11];
    inception3a[2] = ls[14];
    inception3a[3] = ls[17];

    ls[18] = make_convolutional_layer(128, 1, 1, 0, 1, "relu"); // 16

    ls[19] = make_shortcut_layer(ls[18], 1, "linear");
    ls[20] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");
    ls[21] = make_convolutional_layer(192, 3, 1, 1, 1, "relu");

    ls[22] = make_shortcut_layer(ls[18], 1, "linear");
    ls[23] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");
    ls[24] = make_convolutional_layer(96, 5, 1, 2, 1, "relu");

    ls[25] = make_shortcut_layer(ls[18], 1, "linear");
    ls[26] = make_maxpool_layer(3, 1, 1);
    ls[27] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

    Layer **inception3b = malloc(4*sizeof(Layer*));
    ls[28] = make_inception_layer(inception3b, 4, 2);
    inception3b[0] = ls[19];
    inception3b[1] = ls[22];
    inception3b[2] = ls[25];
    inception3b[3] = ls[28];

    ls[29] = make_maxpool_layer(3, 2, 1); // 27

    ls[30] = make_convolutional_layer(192, 1, 1, 0, 1, "relu");

    ls[31] = make_shortcut_layer(ls[30], 1, "linear");
    ls[32] = make_convolutional_layer(96, 1, 1, 0, 1, "relu");
    ls[33] = make_convolutional_layer(208, 3, 1, 1, 1, "relu");

    ls[34] = make_shortcut_layer(ls[30], 1, "linear");
    ls[35] = make_convolutional_layer(16, 1, 1, 0, 1, "relu");
    ls[36] = make_convolutional_layer(48, 5, 1, 2, 1, "relu");

    ls[37] = make_shortcut_layer(ls[30], 1, "linear");
    ls[38] = make_maxpool_layer(3, 1, 1);
    ls[39] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

    Layer **inception4a = malloc(4*sizeof(Layer*));
    ls[40] = make_inception_layer(inception4a, 4, 2);
    inception4a[0] = ls[31];
    inception4a[1] = ls[34];
    inception4a[2] = ls[37];
    inception4a[3] = ls[40];

    // 辅助分类器位置
    ls[41] = make_convolutional_layer(160, 1, 1, 0, 1, "relu"); // 39

    ls[42] = make_shortcut_layer(ls[41], 1, "linear");
    ls[43] = make_convolutional_layer(112, 1, 1, 0, 1, "relu");
    ls[44] = make_convolutional_layer(224, 3, 1, 1, 1, "relu");

    ls[45] = make_shortcut_layer(ls[41], 1, "linear");
    ls[46] = make_convolutional_layer(24, 1, 1, 0, 1, "relu");
    ls[47] = make_convolutional_layer(64, 5, 1, 2, 1, "relu");

    ls[48] = make_shortcut_layer(ls[41], 1, "linear");
    ls[49] = make_maxpool_layer(3, 1, 1);
    ls[50] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

    Layer **inception4b = malloc(4*sizeof(Layer*));
    ls[51] = make_inception_layer(inception4b, 4, 2);
    inception4b[0] = ls[42];
    inception4b[1] = ls[45];
    inception4b[2] = ls[48];
    inception4b[3] = ls[51];

    ls[52] = make_convolutional_layer(128, 1, 1, 0, 1, "relu"); // 50

    ls[53] = make_shortcut_layer(ls[52], 1, "linear");
    ls[54] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");
    ls[55] = make_convolutional_layer(256, 3, 1, 1, 1, "relu");

    ls[56] = make_shortcut_layer(ls[52], 1, "linear");
    ls[57] = make_convolutional_layer(24, 1, 1, 0, 1, "relu");
    ls[58] = make_convolutional_layer(64, 5, 1, 2, 1, "relu");

    ls[59] = make_shortcut_layer(ls[52], 1, "linear");
    ls[60] = make_maxpool_layer(3, 1, 1);
    ls[61] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

    Layer **inception4c = malloc(4*sizeof(Layer*));
    ls[62] = make_inception_layer(inception4c, 4, 2);
    inception4c[0] = ls[53];
    inception4c[1] = ls[56];
    inception4c[2] = ls[59];
    inception4c[3] = ls[62];

    ls[63] = make_convolutional_layer(112, 1, 1, 0, 1, "relu"); // 61

    ls[64] = make_shortcut_layer(ls[63], 1, "linear");
    ls[65] = make_convolutional_layer(144, 1, 1, 0, 1, "relu");
    ls[66] = make_convolutional_layer(288, 3, 1, 1, 1, "relu");

    ls[67] = make_shortcut_layer(ls[63], 1, "linear");
    ls[68] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");
    ls[69] = make_convolutional_layer(64, 5, 1, 2, 1, "relu");

    ls[70] = make_shortcut_layer(ls[63], 1, "linear");
    ls[71] = make_maxpool_layer(3, 1, 1);
    ls[72] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

    Layer **inception4d = malloc(4*sizeof(Layer*));
    ls[73] = make_inception_layer(inception4d, 4, 2);
    inception4d[0] = ls[64];
    inception4d[1] = ls[67];
    inception4d[2] = ls[70];
    inception4d[3] = ls[73];

    // 辅助分类器位置
    ls[74] = make_convolutional_layer(256, 1, 1, 0, 1, "relu"); // 72

    ls[75] = make_shortcut_layer(ls[74], 1, "linear");
    ls[76] = make_convolutional_layer(160, 1, 1, 0, 1, "relu");
    ls[77] = make_convolutional_layer(320, 3, 1, 1, 1, "relu");

    ls[78] = make_shortcut_layer(ls[74], 1, "linear");
    ls[79] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");
    ls[80] = make_convolutional_layer(128, 5, 1, 2, 1, "relu");

    ls[81] = make_shortcut_layer(ls[74], 1, "linear");
    ls[82] = make_maxpool_layer(3, 1, 1);
    ls[83] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");

    Layer **inception4e = malloc(4*sizeof(Layer*));
    ls[84] = make_inception_layer(inception4e, 4, 2);
    inception4e[0] = ls[75];
    inception4e[1] = ls[78];
    inception4e[2] = ls[81];
    inception4e[3] = ls[84];

    ls[85] = make_maxpool_layer(2, 2, 0); // 83 25

    ls[86] = make_convolutional_layer(256, 1, 1, 0, 1, "relu");

    ls[87] = make_shortcut_layer(ls[86], 1, "linear");
    ls[88] = make_convolutional_layer(160, 1, 1, 0, 1, "relu");
    ls[89] = make_convolutional_layer(320, 3, 1, 1, 1, "relu");

    ls[90] = make_shortcut_layer(ls[86], 1, "linear");
    ls[91] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");
    ls[92] = make_convolutional_layer(128, 5, 1, 2, 1, "relu");

    ls[93] = make_shortcut_layer(ls[86], 1, "linear");
    ls[94] = make_maxpool_layer(3, 1, 1);
    ls[95] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");

    Layer **inception5a = malloc(4*sizeof(Layer*));
    ls[96] = make_inception_layer(inception5a, 4, 2);
    inception5a[0] = ls[87];
    inception5a[1] = ls[90];
    inception5a[2] = ls[93];
    inception5a[3] = ls[96];

    ls[97] = make_convolutional_layer(384, 1, 1, 0, 1, "relu"); // 95

    ls[98] = make_shortcut_layer(ls[97], 1, "linear");
    ls[99] = make_convolutional_layer(192, 1, 1, 0, 1, "relu");
    ls[100] = make_convolutional_layer(384, 3, 1, 1, 1, "relu");

    ls[101] = make_shortcut_layer(ls[97], 1, "linear");
    ls[102] = make_convolutional_layer(48, 1, 1, 0, 1, "relu");
    ls[103] = make_convolutional_layer(128, 5, 1, 2, 1, "relu");

    ls[104] = make_shortcut_layer(ls[97], 1, "linear");
    ls[105] = make_maxpool_layer(3, 1, 1);
    ls[106] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");

    Layer **inception5b = malloc(4*sizeof(Layer*));
    ls[107] = make_inception_layer(inception5b, 4, 2);
    inception5b[0] = ls[98];
    inception5b[1] = ls[101];
    inception5b[2] = ls[104];
    inception5b[3] = ls[107];

    ls[108] = make_avgpool_layer(7, 7, 0); // 106
    ls[109] = make_dropout_layer(0.5);
    ls[110] = make_connect_layer(100, 1, "linear");
    ls[111] = make_crossentropy_layer(NULL, -1);

    for (int i = 0; i < 112; ++i) {
        append_layer2grpah(graph, ls[i]);
        Layer *l = ls[i];
        if (l->type == CONVOLUTIONAL){
            init_kaiming_uniform_kernel(l, sqrt(5.0), "fan_in", "relu");
            init_constant_bias(l, 0);
        }
        if (l->type == CONNECT){
            init_kaiming_normal_kernel(l, sqrt(5.0), "fan_in", "relu");
            init_constant_bias(l, 0);
        }
    }

    Session *sess = create_session(graph, 96, 96, 3, 100, type, path);
    float *mean = calloc(3, sizeof(float));
    float *std = calloc(3, sizeof(float));
    mean[0] = 0.485;
    mean[1] = 0.456;
    mean[2] = 0.406;
    std[0] = 0.229;
    std[1] = 0.224;
    std[2] = 0.225;
    transform_normalize_sess(sess, mean, std);
    transform_resize_sess(sess, 96, 96);
    set_train_params(sess, 150, 64, 64, 0.0001);
    SGDOptimizer_sess(sess, 0.9, 0, 0, 0, 0);
    init_session(sess, "./data/cifar100/train.txt", "./data/cifar100/train_label.txt");
    train(sess);
}

void googlenet_detect(char*type, char *path)
{
    Graph *graph = create_graph();
    Layer **ls = malloc(112*sizeof(Layer*));
    ls[0] = make_convolutional_layer(64, 7, 2, 3, 0, "linear");
    ls[1] = make_normalization_layer(0.1, 1, "relu");
    ls[2] = make_maxpool_layer(3, 2, 1);
    ls[3] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");
    ls[4] = make_convolutional_layer(192, 3, 1, 1, 0, "linear");
    ls[5] = make_normalization_layer(0.1, 1, "relu");
    ls[6] = make_maxpool_layer(3, 2, 1);

    ls[7] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

    ls[8] = make_shortcut_layer(ls[7], 1, "linear");
    ls[9] = make_convolutional_layer(96, 1, 1, 0, 1, "relu");
    ls[10] = make_convolutional_layer(128, 3, 1, 1, 1, "relu");

    ls[11] = make_shortcut_layer(ls[7], 1, "linear");
    ls[12] = make_convolutional_layer(16, 1, 1, 0, 1, "relu");
    ls[13] = make_convolutional_layer(32, 5, 1, 2, 1, "relu");

    ls[14] = make_shortcut_layer(ls[7], 1, "linear");
    ls[15] = make_maxpool_layer(3, 1, 1);
    ls[16] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");

    Layer **inception3a = malloc(4*sizeof(Layer*));
    ls[17] = make_inception_layer(inception3a, 4, 2);
    inception3a[0] = ls[8];
    inception3a[1] = ls[11];
    inception3a[2] = ls[14];
    inception3a[3] = ls[17];

    ls[18] = make_convolutional_layer(128, 1, 1, 0, 1, "relu"); // 16

    ls[19] = make_shortcut_layer(ls[18], 1, "linear");
    ls[20] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");
    ls[21] = make_convolutional_layer(192, 3, 1, 1, 1, "relu");

    ls[22] = make_shortcut_layer(ls[18], 1, "linear");
    ls[23] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");
    ls[24] = make_convolutional_layer(96, 5, 1, 2, 1, "relu");

    ls[25] = make_shortcut_layer(ls[18], 1, "linear");
    ls[26] = make_maxpool_layer(3, 1, 1);
    ls[27] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

    Layer **inception3b = malloc(4*sizeof(Layer*));
    ls[28] = make_inception_layer(inception3b, 4, 2);
    inception3b[0] = ls[19];
    inception3b[1] = ls[22];
    inception3b[2] = ls[25];
    inception3b[3] = ls[28];

    ls[29] = make_maxpool_layer(3, 2, 1); // 27

    ls[30] = make_convolutional_layer(192, 1, 1, 0, 1, "relu");

    ls[31] = make_shortcut_layer(ls[30], 1, "linear");
    ls[32] = make_convolutional_layer(96, 1, 1, 0, 1, "relu");
    ls[33] = make_convolutional_layer(208, 3, 1, 1, 1, "relu");

    ls[34] = make_shortcut_layer(ls[30], 1, "linear");
    ls[35] = make_convolutional_layer(16, 1, 1, 0, 1, "relu");
    ls[36] = make_convolutional_layer(48, 5, 1, 2, 1, "relu");

    ls[37] = make_shortcut_layer(ls[30], 1, "linear");
    ls[38] = make_maxpool_layer(3, 1, 1);
    ls[39] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

    Layer **inception4a = malloc(4*sizeof(Layer*));
    ls[40] = make_inception_layer(inception4a, 4, 2);
    inception4a[0] = ls[31];
    inception4a[1] = ls[34];
    inception4a[2] = ls[37];
    inception4a[3] = ls[40];

    // 辅助分类器位置
    ls[41] = make_convolutional_layer(160, 1, 1, 0, 1, "relu"); // 39

    ls[42] = make_shortcut_layer(ls[41], 1, "linear");
    ls[43] = make_convolutional_layer(112, 1, 1, 0, 1, "relu");
    ls[44] = make_convolutional_layer(224, 3, 1, 1, 1, "relu");

    ls[45] = make_shortcut_layer(ls[41], 1, "linear");
    ls[46] = make_convolutional_layer(24, 1, 1, 0, 1, "relu");
    ls[47] = make_convolutional_layer(64, 5, 1, 2, 1, "relu");

    ls[48] = make_shortcut_layer(ls[41], 1, "linear");
    ls[49] = make_maxpool_layer(3, 1, 1);
    ls[50] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

    Layer **inception4b = malloc(4*sizeof(Layer*));
    ls[51] = make_inception_layer(inception4b, 4, 2);
    inception4b[0] = ls[42];
    inception4b[1] = ls[45];
    inception4b[2] = ls[48];
    inception4b[3] = ls[51];

    ls[52] = make_convolutional_layer(128, 1, 1, 0, 1, "relu"); // 50

    ls[53] = make_shortcut_layer(ls[52], 1, "linear");
    ls[54] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");
    ls[55] = make_convolutional_layer(256, 3, 1, 1, 1, "relu");

    ls[56] = make_shortcut_layer(ls[52], 1, "linear");
    ls[57] = make_convolutional_layer(24, 1, 1, 0, 1, "relu");
    ls[58] = make_convolutional_layer(64, 5, 1, 2, 1, "relu");

    ls[59] = make_shortcut_layer(ls[52], 1, "linear");
    ls[60] = make_maxpool_layer(3, 1, 1);
    ls[61] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

    Layer **inception4c = malloc(4*sizeof(Layer*));
    ls[62] = make_inception_layer(inception4c, 4, 2);
    inception4c[0] = ls[53];
    inception4c[1] = ls[56];
    inception4c[2] = ls[59];
    inception4c[3] = ls[62];

    ls[63] = make_convolutional_layer(112, 1, 1, 0, 1, "relu"); // 61

    ls[64] = make_shortcut_layer(ls[63], 1, "linear");
    ls[65] = make_convolutional_layer(144, 1, 1, 0, 1, "relu");
    ls[66] = make_convolutional_layer(288, 3, 1, 1, 1, "relu");

    ls[67] = make_shortcut_layer(ls[63], 1, "linear");
    ls[68] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");
    ls[69] = make_convolutional_layer(64, 5, 1, 2, 1, "relu");

    ls[70] = make_shortcut_layer(ls[63], 1, "linear");
    ls[71] = make_maxpool_layer(3, 1, 1);
    ls[72] = make_convolutional_layer(64, 1, 1, 0, 1, "relu");

    Layer **inception4d = malloc(4*sizeof(Layer*));
    ls[73] = make_inception_layer(inception4d, 4, 2);
    inception4d[0] = ls[64];
    inception4d[1] = ls[67];
    inception4d[2] = ls[70];
    inception4d[3] = ls[73];

    // 辅助分类器位置
    ls[74] = make_convolutional_layer(256, 1, 1, 0, 1, "relu"); // 72

    ls[75] = make_shortcut_layer(ls[74], 1, "linear");
    ls[76] = make_convolutional_layer(160, 1, 1, 0, 1, "relu");
    ls[77] = make_convolutional_layer(320, 3, 1, 1, 1, "relu");

    ls[78] = make_shortcut_layer(ls[74], 1, "linear");
    ls[79] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");
    ls[80] = make_convolutional_layer(128, 5, 1, 2, 1, "relu");

    ls[81] = make_shortcut_layer(ls[74], 1, "linear");
    ls[82] = make_maxpool_layer(3, 1, 1);
    ls[83] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");

    Layer **inception4e = malloc(4*sizeof(Layer*));
    ls[84] = make_inception_layer(inception4e, 4, 2);
    inception4e[0] = ls[75];
    inception4e[1] = ls[78];
    inception4e[2] = ls[81];
    inception4e[3] = ls[84];

    ls[85] = make_maxpool_layer(2, 2, 0); // 83 25

    ls[86] = make_convolutional_layer(256, 1, 1, 0, 1, "relu");

    ls[87] = make_shortcut_layer(ls[86], 1, "linear");
    ls[88] = make_convolutional_layer(160, 1, 1, 0, 1, "relu");
    ls[89] = make_convolutional_layer(320, 3, 1, 1, 1, "relu");

    ls[90] = make_shortcut_layer(ls[86], 1, "linear");
    ls[91] = make_convolutional_layer(32, 1, 1, 0, 1, "relu");
    ls[92] = make_convolutional_layer(128, 5, 1, 2, 1, "relu");

    ls[93] = make_shortcut_layer(ls[86], 1, "linear");
    ls[94] = make_maxpool_layer(3, 1, 1);
    ls[95] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");

    Layer **inception5a = malloc(4*sizeof(Layer*));
    ls[96] = make_inception_layer(inception5a, 4, 2);
    inception5a[0] = ls[87];
    inception5a[1] = ls[90];
    inception5a[2] = ls[93];
    inception5a[3] = ls[96];

    ls[97] = make_convolutional_layer(384, 1, 1, 0, 1, "relu"); // 95

    ls[98] = make_shortcut_layer(ls[97], 1, "linear");
    ls[99] = make_convolutional_layer(192, 1, 1, 0, 1, "relu");
    ls[100] = make_convolutional_layer(384, 3, 1, 1, 1, "relu");

    ls[101] = make_shortcut_layer(ls[97], 1, "linear");
    ls[102] = make_convolutional_layer(48, 1, 1, 0, 1, "relu");
    ls[103] = make_convolutional_layer(128, 5, 1, 2, 1, "relu");

    ls[104] = make_shortcut_layer(ls[97], 1, "linear");
    ls[105] = make_maxpool_layer(3, 1, 1);
    ls[106] = make_convolutional_layer(128, 1, 1, 0, 1, "relu");

    Layer **inception5b = malloc(4*sizeof(Layer*));
    ls[107] = make_inception_layer(inception5b, 4, 2);
    inception5b[0] = ls[98];
    inception5b[1] = ls[101];
    inception5b[2] = ls[104];
    inception5b[3] = ls[107];

    ls[108] = make_avgpool_layer(7, 7, 0); // 106
    ls[109] = make_dropout_layer(0.5);
    ls[110] = make_connect_layer(100, 1, "linear");
    ls[111] = make_crossentropy_layer(NULL, -1);

    for (int i = 0; i < 112; ++i) {
        append_layer2grpah(graph, ls[i]);
    }

    Session *sess = create_session(graph, 96, 96, 3, 100, type, path);
    float *mean = calloc(3, sizeof(float));
    float *std = calloc(3, sizeof(float));
    mean[0] = 0.485;
    mean[1] = 0.456;
    mean[2] = 0.406;
    std[0] = 0.229;
    std[1] = 0.224;
    std[2] = 0.225;
    transform_normalize_sess(sess, mean, std);
    transform_resize_sess(sess, 96, 96);
    set_detect_params(sess);
    init_session(sess, "./data/cifar100/train.txt", "./data/cifar100/train_label.txt");
    detect_classification(sess);
}

在Lumos框架中demo目录下,您能找到googlenet.c文件,这就是我们已实现的googlenet模型

结果展示

该网络在经过150个epoch训练后,Top1分类精度在90%左右