MODEL ROBUSTNESS
Question
How random variations in the seed and the antiseed affect the pruning step.
Approach
We emulate random variations by iterating over various training/test set randomly drawn from the same set of annotated data. We define a default model architecture, train as many models as training/test sets and compare the predictions of the different models out of sample (10,000) on a common benchmark of families/publications drawn from the expansion set.
Specifically, we look at:
moments of the distribution of standard errors of models predictions (std computed at the family/publication level)
models' consensus regarding positive and negative examples
Results
additivemanufacturing - cnn
Score dispersion
std_score
count
10000
mean
0.09
std
0.054
min
0
25%
0.055
50%
0.082
75%
0.113
max
0.384
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
9
747
0.145
0.145
8
520
0.101
0.246
7
595
0.115
0.361
6
486
0.094
0.455
5
510
0.099
0.554
4
560
0.109
0.663
3
603
0.117
0.78
2
536
0.104
0.884
1
600
0.116
1
0
0
0
1
additivemanufacturing - mlp
Score dispersion
std_score
count
10000
mean
0.041
std
0.039
min
0
25%
0.014
50%
0.029
75%
0.056
max
0.276
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
10
320
0.225
0.225
9
126
0.089
0.314
8
112
0.079
0.393
7
42
0.03
0.423
6
114
0.08
0.503
5
95
0.067
0.57
4
124
0.087
0.657
3
129
0.091
0.748
2
152
0.107
0.855
1
206
0.145
1
0
0
0
1
blockchain - cnn
Score dispersion
std_score
count
10000
mean
0.074
std
0.073
min
0
25%
0.012
50%
0.047
75%
0.126
max
0.389
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
10
410
0.084
0.084
9
360
0.074
0.158
8
352
0.072
0.23
7
336
0.069
0.299
6
360
0.074
0.373
5
390
0.08
0.453
4
380
0.078
0.531
3
357
0.073
0.604
2
1410
0.289
0.893
1
522
0.107
1
0
0
0
1
blockchain - mlp
Score dispersion
std_score
count
10000
mean
0.025
std
0.04
min
0
25%
0.003
50%
0.008
75%
0.03
max
0.272
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
10
160
0.179
0.179
9
18
0.02
0.199
8
24
0.027
0.226
7
42
0.047
0.273
6
66
0.074
0.346
5
125
0.14
0.486
4
56
0.063
0.549
3
105
0.117
0.666
2
122
0.136
0.802
1
177
0.198
1
0
0
0
1
computervision - cnn
Score dispersion
std_score
count
10000
mean
0.065
std
0.074
min
0
25%
0.003
50%
0.029
75%
0.121
max
0.383
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
10
3590
0.381
0.381
9
810
0.086
0.467
8
504
0.054
0.521
7
483
0.051
0.572
6
468
0.05
0.622
5
400
0.042
0.664
4
340
0.036
0.7
3
330
0.035
0.735
2
2072
0.22
0.955
1
420
0.045
1
0
0
0
1
computervision - mlp
Score dispersion
std_score
count
10000
mean
0.035
std
0.046
min
0
25%
0.006
50%
0.015
75%
0.042
max
0.291
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
10
3310
0.55
0.55
9
684
0.114
0.664
8
360
0.06
0.724
7
357
0.059
0.783
6
300
0.05
0.833
5
225
0.037
0.87
4
212
0.035
0.906
3
204
0.034
0.939
2
180
0.03
0.969
1
184
0.031
1
0
0
0
1
genomeediting - cnn
Score dispersion
std_score
count
10000
mean
0.03
std
0.053
min
0
25%
0
50%
0.001
75%
0.023
max
0.329
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
10
450
0.126
0.126
9
90
0.025
0.152
8
88
0.025
0.176
7
70
0.02
0.196
6
96
0.027
0.223
5
90
0.025
0.248
4
280
0.079
0.327
3
1866
0.524
0.851
2
356
0.1
0.951
1
175
0.049
1
0
0
0
1
genomeediting - mlp
Score dispersion
std_score
count
10000
mean
0.011
std
0.021
min
0
25%
0.001
50%
0.003
75%
0.01
max
0.205
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
10
360
0.462
0.462
9
63
0.081
0.543
8
56
0.072
0.615
7
28
0.036
0.651
6
60
0.077
0.728
5
40
0.051
0.779
4
60
0.077
0.856
3
33
0.042
0.899
2
48
0.062
0.96
1
31
0.04
1
0
0
0
1
hydrogenstorage - cnn
Score dispersion
std_score
count
10000
mean
0.046
std
0.04
min
0
25%
0.011
50%
0.037
75%
0.076
max
0.311
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
10
230
0.213
0.213
9
117
0.108
0.321
8
120
0.111
0.432
7
91
0.084
0.516
6
90
0.083
0.599
5
75
0.069
0.669
4
100
0.093
0.761
3
72
0.067
0.828
2
84
0.078
0.906
1
102
0.094
1
0
0
0
1
hydrogenstorage - mlp
Score dispersion
std_score
count
10000
mean
0.026
std
0.031
min
0.001
25%
0.007
50%
0.015
75%
0.03
max
0.257
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
10
160
0.21
0.21
9
54
0.071
0.281
8
40
0.052
0.333
7
63
0.083
0.416
6
78
0.102
0.518
5
65
0.085
0.604
4
80
0.105
0.709
3
60
0.079
0.787
2
72
0.094
0.882
1
90
0.118
1
0
0
0
1
naturallanguageprocessing - cnn
Score dispersion
std_score
count
10000
mean
0.058
std
0.072
min
0
25%
0.001
50%
0.018
75%
0.114
max
0.39
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
10
1630
0.199
0.199
9
666
0.081
0.281
8
520
0.064
0.344
7
301
0.037
0.381
6
354
0.043
0.424
5
475
0.058
0.482
4
2972
0.363
0.846
3
525
0.064
0.91
2
384
0.047
0.957
1
353
0.043
1
0
0
0
1
naturallanguageprocessing - mlp
Score dispersion
std_score
count
10000
mean
0.028
std
0.036
min
0
25%
0.004
50%
0.012
75%
0.036
max
0.339
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
10
950
0.368
0.368
9
315
0.122
0.49
8
216
0.084
0.574
7
231
0.09
0.663
6
204
0.079
0.742
5
170
0.066
0.808
4
120
0.046
0.855
3
96
0.037
0.892
2
126
0.049
0.941
1
153
0.059
1
0
0
0
1
selfdrivingvehicle - cnn
Score dispersion
std_score
count
10000
mean
0.085
std
0.044
min
0
25%
0.052
50%
0.091
75%
0.111
max
0.324
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
10
610
0.086
0.086
9
765
0.107
0.193
8
504
0.071
0.263
7
483
0.068
0.331
6
468
0.066
0.397
5
390
0.055
0.451
4
488
0.068
0.52
3
423
0.059
0.579
2
2292
0.321
0.9
1
711
0.1
1
0
0
0
1
selfdrivingvehicle - mlp
Score dispersion
std_score
count
10000
mean
0.049
std
0.036
min
0.002
25%
0.021
50%
0.039
75%
0.069
max
0.278
Models consensus
nb_models
nb_positives
share_positives
cumshare_positives
10
1210
0.302
0.302
9
414
0.104
0.406
8
336
0.084
0.49
7
294
0.074
0.564
6
276
0.069
0.633
5
335
0.084
0.716
4
308
0.077
0.793
3
243
0.061
0.854
2
266
0.066
0.92
1
318
0.08
1
0
0
0
1