-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy patha4amanual.Rnw
1478 lines (1156 loc) · 77.4 KB
/
a4amanual.Rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[a4paper,english,11pt]{article}
\usepackage{a4a}
\begin{document}
\title{Stock assessment with the \aFa statistical catch-at-age framework}
\input{authors}
\maketitle
\begin{abstract}
This chapter presents the statistical catch-at-age stock assessment model developed as part of the Assessment For All (\aFa) initiative of the European Commission Joint Research Centre. The stock assessment model framework is a non-linear catch-at-age model implemented in \href{http://www.r-project.org/}{\R}, \href{http://www.flr-project.org/}{\FLR} and \href{http://www.admb-project.org/}{\ADMB}, that can be applied rapidly to a wide range of situations with low parametrization requirements. The model structure is defined by submodels, which are the different parts that require structural assumptions. There are 5 submodels in operation: F-at-age, abundance indices catchability-at-age, recruitment, observation variance of catch-at-age and abundance indices-at-age, and population's age structurein the first year. All submodels use the same type of specification process, the \R formula interface, wich gives lot's of flexibility to explore models and combination of submodels.
\end{abstract}
\newpage
\tableofcontents
\newpage
<<knitr_opts, echo=FALSE, message=FALSE, warning=FALSE>>=
library(knitr)
library(formatR)
#thm = knit_theme$get("bclear") #moe, bclear
#knit_theme$set(thm)
opts_chunk$set(dev='png', cache=TRUE, fig.align='center', warning=FALSE, message=FALSE, dev.args=list(type="cairo"), dpi=96, highlight=TRUE, background='#F2F2F2', fig.lp="fig:", fig.pos="H", width=70, tidy=TRUE, out.width='.9\\linewidth')
# lattice theme
@
\section{Before starting}
\subsection{License, documentation and development status}
The software is released under the \href{https://joinup.ec.europa.eu/community/eupl/home}{EUPL 1.1}.
For more information on the \aFa methodologies refer to \href{http://icesjms.oxfordjournals.org/content/early/2014/04/03/icesjms.fsu050.abstract}{Jardim, et.al, 2014}, \href{http://icesjms.oxfordjournals.org/content/early/2014/03/31/icesjms.fsu043.abstract}{Millar, et.al, 2014} and \href{http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0154922}{Scott, et.al, 2016}.
Documentation can be found at \url{http://flr-project.org/FLa4a}. You are welcome to:
\begin{itemize}
\item Submit suggestions and bug-reports at: \url{https://github.com/flr/FLa4a/issues}
\item Send a pull request on: \url{https://github.com/flr/FLa4a/}
\item Compose a friendly e-mail to the maintainer, see `packageDescription('FLa4a')`
\end{itemize}
\subsection{Installing and loading libraries}
To run the \code{FLa4a} methods the reader will need to install the package and its dependencies and load them. Some datasets are distributed with the package and as such need to be loaded too.
<<eval=FALSE>>=
# from CRAN
install.packages(c("copula","triangle", "coda", "grid", "gridExtra", "latticeExtra"))
# from FLR
install.packages(c("FLCore", "FLa4a"), repos="http://flr-project.org/R")
@
<<load_libraries, message=FALSE, warning=FALSE>>=
# libraries
library(devtools)
library(FLa4a)
library(XML)
library(reshape2)
library(ggplotFL)
# datasets
data(ple4)
data(ple4.indices)
data(ple4.index)
data(rfLen)
@
<<>>=
packageVersion("FLCore")
packageVersion("FLa4a")
@
\subsection{How to read this document}
The target audience for this document are readers with some experience in R and some background on stock assessment.
The document explains the approach being developed by \aFa for fish stock assessment and scientific advice. It presents a mixture of text and code, where the first explains the concepts behind the methods, while the last shows how these can be run with the software provided. Moreover, having the code allows the reader to copy/paste and replicate the analysis presented here.
The sections and subsections are as independent as possible, so it can be used as a reference document for the \code{FLa4a}.
\subsection{Notation}
Along this chapter the notation presented in Table~\ref{tab:mathsnotation} will be used. Mathematical descriptions will be kept as simple as possible for readability.
\begin{table}
\caption{Mathematical notation}
\label{tab:mathsnotation}
\begin{center}
\begin{tabular}{cc}
\hline
Symbol & Description \\
\hline
\hline
Variables & \\
$C$ & catches \\
$F$ & fishing mortality \\
$M$ & natural mortality \\
$R$ & recruitment \\
$Q$ & vessel or fleet catchability \\
$w$ & weights \\
$l$ & likelihood \\
$I$ & abundance index \\
$S$ & spawning stock biomass \\
$CV$ & coefficient of variation \\
$D$ & residuals or deviances \\
$N$ & normal distribution \\
$\beta$ & parameter \\
$a$ & stock-recruitment parameter \\
$b$ & stock-recruitment parameter \\
$\sigma^2$ & variance of catch \\
$\tau^2$ & variance of index \\
$\phi^2$ & variance of predicted recruitment \\
$\upsilon^2$ & variance of residuals \\
Subscripts & \\
$a$ & age \\
$y$ & year \\
$C$ & catch \\
$I$ & abundance index \\
$N$ & normal distribution \\
$s$ & survey \\
$SR$ & stock recruitment relationship \\
Superscripts and accents & \\
$\hat{}$ & observation \\
$\tilde{}$ & prediction \\
$c$ & catches \\
$s$ & abundance index \\
\hline
\end{tabular}
\end{center}
\end{table}
\section{Introduction}
The \aFa stock assessment framework is based in a non-linear catch-at-age model implemented in \R, \FLR and \ADMB that can be applied rapidly to a wide range of situations with low setup requirements.
The framework is built of submodels which define the different parts of a statistical catch at age model that require structural assumptions. In the \aFa framework these are fishing mortality-at-age, abundance indicies catchability-at-age, recruitment, observation variances of catch-at-age and abundance indices-at-age, and abundance-at-age in the first year of the data series (see section~\ref{sec:math} for details).
Other important processes, like natural mortality, individual growth and reproduction, are treated as fixed(inputs??), as it's common in stock assessment methods. Nevertheless, the \aFa framework provides methods to condition these processes prior to the model fit, and propagate their uncertainty into the assessment process. See chapters XX and section XX.
The submodels formulation uses linear models, which opens the possibility of using the linear modelling tools available in \R. For example, \href{http://cran.r-project.org/web/packages/mgcv/index.html}{\code{mgcv}} gam formulas or factorial design formulas using \code{lm()}.
The 'language' of linear models has been developing within the statistical community for many years, and constitutes an elegant way of defining models without going through the complexity of mathematical representations. This approach makes it also easier to communicate among scientists:
\begin{itemize}
\item \href{http://rspa.royalsocietypublishing.org/content/283/1393/147.short}{1965 J. A. Nelder}, notation for randomized block design
\item \href{http://www.jstor.org/stable/info/2346786}{1973 Wilkinson and Rodgers}, symbolic description for factorial designs
\item \href{http://books.google.com/books?isbn=0412343908}{1990 Hastie and Tibshirani}, introduced notation for smoothers
\item \href{http://books.google.com/books?isbn=041283040X}{1991 Chambers and Hastie}, further developed for use in S
\end{itemize}
\section{Stock assessment framework maths description}\label{sec:math}
The stock assessment model behind the framework is based in two main equations to link observations to processes, the Baranov's catch equation and abundance indices equation.
Catches in numbers by age and year are defined in terms of the three quantities: natural mortality, fishing mortality and recruitment; using a modified form of the well known Baranov catch equation:
$$C_{ay} = \frac{F_{ay}}{F_{ay}+M_{ay}}\left(1 - e^{-(F_{ay}+M_{ay})}\right) R_{y}e^{-\sum (F_{ay} + M_{ay})} $$
Survey indices by age and year are defined in terms of the same three quantities with the addition of survey catchability:
$$I_{ays} = Q_{ays} R_{y}e^{-\sum (F_{ay} + M_{ay})}$$
Observed catches and observed survey indices are assumed to be log-normally distributed, or equivalently, normally distributed on the log-scale, with specific observation variance:
$$ \log \hat{C}_{ay} \sim \text{Normal} \Big( \log C_{ay}, \sigma^2_{ay}\Big) $$
$$ \log \hat{I}_{ays} \sim \text{Normal} \Big( \log I_{ays}, \tau^2_{ays} \Big) $$
The log-likelihood can now be defined as the sum of the log-likelihood of the observed catches:
$$ \ell_C = \sum_{ay} w^{(c)}_{ay}\ \ell_N \Big( \log C_{ay}, \sigma^2_{ay} ;\ \log \hat{C}_{ay} \Big) $$
and the log-likelihood of the observed survey indices as:
$$ \ell_I = \sum_s \sum_{ay} w^{(s)}_{ays}\ \ell_N \Big( \log I_{ays}, \tau_{ays}^2 ;\ \log \hat{I}_{ays} \Big)$$
giving the total log-likelihood
$$\ell = \ell_C + \ell_I$$
which is defined in terms of the strictly positive quantites, $M_{ay}$, $F_{ay}$, $Q_{ays}$ and $R_{y}$, and the observation variances $\sigma_{ay}$ and $\tau_{ays}$. As such, the log-likelihood is over-parameterised as there are many more parameters than observations. In order to reduce the number of parameters, $M_{ay}$ is assumed known (as is common).
%====================================================================
%THE FOLLOWING NEEDS REVISION, need to bring in N1 submod
%====================================================================
The remaining parameters are written in terms of a linear combination of covariates $x_{ayk}$, e.g.
$$\log F_{ay} = \sum_k \beta_k x_{ayk}$$
where $k$ is the number of parameters to be estimated and is sufficiently small. Using this tecnique the quantities $\log F$, $\log Q$, $\log \sigma$ and $\log \tau$
%$\log \text{initial\,age\,structure}$ % this is not present in the above
(in bold in the equations above) can be described by a reduced number of parameters. The following section has more discussion on the use of linear models in \aFa.
%====================================================================
The \aFa statistical catch-at-age model can addionally allow for a functional relationship that links predicted recruitment $\tilde{R}$ based on spawning stock biomass and modelled recruitment $R$, to be imposed as a fixed variance random effect. [NEEDS REVISION, sentence not clear]
Options for the relationship are the hard coded models Ricker, Beverton Holt, smooth hockeystick or geometric mean. This is implemented by including a third component in the log-likelihood:
$$\ell_{SR} = \sum_y \ell_N \Big( \log \tilde{R}_y(a, b), \phi_y^2 ;\ \log R_y \Big)$$
giving the total log-likelihood
$$\ell = \ell_C + \ell_I + \ell_{SR}$$
Using the (time varying) Ricker model as an example, predicted recruitment is
$$\tilde{R}_y(a_y,b_y) = a_y S_{y-1} e^{-b_y S_{y-1}}$$
where $S$ is spawning stock biomass derived from the model parameters $F$ and $R$, and the fixed quantites $M$ and mean weights by year and age. It is assumed that $R$ is log-normally distributed, or equivalently, normally distributed on the log-scale about the (log) recruitment predicted by the SR model $\tilde{R}$, with known variance $\phi^2$, i.e.
$$\log R_y \sim \text{Normal} \Big( \log \tilde{R}_y, \phi_y^2 \Big)$$
which leads to the definition of $\ell_{SR}$ given above. In all cases $a$ and $b$ are strictly positive, and with the quantities $F$, $R$, etc. linear models are used to parameterise $\log a$ and/or $\log b$, where relevant.
%====================================================================
%THE FOLLOWING NEEDS REVISION, this is not just the default I guess,
% it's always present since R predictions will be a mix of S/R and $\gamma$
%====================================================================
By default, recruitment $R$ as apposed to the reruitment predicted from a stock recruitment model $\tilde{R}$, is specified as a linear model with a parameter for each year, i.e.
$$\log R_y = \gamma_y$$
This is to allow modelled recruitment $R_y$ to be shrunk towards the stock recruitment model. However, if it is considered appropriate that recruitment can be determined exactly by a relationship with covariates, it is possible, to instead define $\log R$ in terms of a linear model in the same way as $\log F$, $\log Q$, $\log \sigma$ and $\log \tau$. %But this is pretty much the same as taking a geometric mean, with a model on log a, and making the variance very small.
%====================================================================
% WE NEED TO ADD SOMETHING ABOUT HOW THE PLUSGROUP IS MODELLED
%====================================================================
%====================================================================
\section{Submodel structure}\label{sec:submod}
The \aFa stock assessment framework allows the user to set up a large number of different models. The mechanics which provide this flexibility are designed around the concept of submodels. Each unknown variable that must be estimated is treated as a linear model, for which the user has to define the model structure using \R formulas, including \href{http://cran.r-project.org/web/packages/mgcv/index.html}{\code{mgcv}} gam formulas. All submodels use the same specification process, the \R formula interface, wich gives lot's of flexibility to explore models and combination of submodels.
There are 5 submodels in operation:
\begin{itemize}
\item a model for F-at-age ($F_{ay}$),
\item a (list) of model(s) for abundance indices catchability-at-age ($Q_{ays}$),
\item a model for recruitment ($R_y$),
\item a list of models for the observation variance of catch-at-age and abundance indices ($\{\sigma^2_{ay}, \tau^2_{ays}\}$),
\item a model for the initial age structure $N_{a,y=1}$,
\end{itemize}
When setting the structure of each submodel the user is in fact building the predictive model and its parameters. The optimization process, done through \ADMB, estimates the parameters and their variance-covariance matrix, allowing further analysis to be carried out, like simulation, prediction, diagnostics, etc. All the statistical machinery will be at the user's reach.
\subsection{Submodel building blocks and fundamental R formulas}
The elements available to build submodels formulas are 'age' and 'year', which can be used to build models with different structures.
In \R's linear modelling language, a constant model is coded as \code{\midtilde 1}, while a slope over time would simply be \code{\midtilde year}, a smoother over time \code{\midtilde s(year, k=10)}, a model with a coefficient for each year (also called dummy variables) would be \code{\midtilde factor(year)}. Transformations of the variables are as usual, e.g. \code{\midtilde sqrt(year)}, etc; while combinations of all the above can be done non-convergence will limit the possibilities.
Using the $F$ submodel as example the following specifies the models described in the previous paragraph:
<<fund_forms, fig.cap="Example of fundamental R formulas">>=
# models
m1 <- ~1
m2 <- ~ year
m3 <- ~ s(year, k=10)
m4 <- ~ factor(year)
m5 <- ~ sqrt(year)
# fits
fit1 <- sca(ple4, ple4.indices, fmodel=m1, fit="MP")
fit2 <- sca(ple4, ple4.indices, fmodel=m2, fit="MP")
fit3 <- sca(ple4, ple4.indices, fmodel=m3, fit="MP")
fit4 <- sca(ple4, ple4.indices, fmodel=m4, fit="MP")
fit5 <- sca(ple4, ple4.indices, fmodel=m5, fit="MP")
# plot
lst <- FLStocks(constant=ple4+fit1, linear=ple4+fit2, smooth=ple4+fit3, factor=ple4+fit4, sqrt=ple4+fit5)
lst <- lapply(lst, fbar)
lgnd <- list(points=FALSE, lines=TRUE, space='right')
xyplot(data~year, groups=qname, lst, auto.key=lgnd, type='l', ylab='fishing mortality')
@
The models above and their combinations can be used to model both 'age' and 'year'. The corresponding fits for age are:
<<fund_forms_age, fig.cap="Example of fundamental R formulas">>=
# models
m1 <- ~1
m2 <- ~ age
m3 <- ~ s(age, k=3)
m4 <- ~ factor(age)
m5 <- ~ sqrt(age)
# fits
fit1 <- sca(ple4, ple4.indices, fmodel=m1, fit="MP")
fit2 <- sca(ple4, ple4.indices, fmodel=m2, fit="MP")
fit3 <- sca(ple4, ple4.indices, fmodel=m3, fit="MP")
fit4 <- sca(ple4, ple4.indices, fmodel=m4, fit="MP")
fit5 <- sca(ple4, ple4.indices, fmodel=m5, fit="MP")
# plot
lst <- FLStocks(constant=ple4+fit1, linear=ple4+fit2, smooth=ple4+fit3, factor=ple4+fit4, sqrt=ple4+fit5)
lst <- lapply(lst, function(x) harvest(x)[,'2000'])
xyplot(data~age, groups=qname, lst, auto.key=lgnd, type='l', ylab='fishing mortality in 2000')
@
\subsection{The major effects available for modelling}
Although the building blocks for formulas are 'age' and 'year', in fact there are three effects that can be modelled for each submodel: 'age', 'year' and 'cohort'. As examples note the following models for fishing mortality.
<<>>=
# the age effect
ageeffect <- ~ factor(age)
# the year effect
yeareffect <- ~ factor(year)
# the cohort
cohorteffect <- ~ factor(year-age)
# the fits
fit1 <- sca(ple4, ple4.indices, fmodel=yeareffect)
fit2 <- sca(ple4, ple4.indices, fmodel=ageeffect)
fit3 <- sca(ple4, ple4.indices, fmodel=cohorteffect)
@
and the graphical representation of the three models in Figures~\ref{fig:majeffy} to \ref{fig:majeffc}.
<<majeffy, fig.cap="Major effects: the year effect (~ factor(year))">>=
wireframe(harvest(fit1), main='year effect')
@
<<majeffa, fig.cap="Major effects: the age effect (~ factor(age))">>=
wireframe(harvest(fit2), main='age effect')
@
<<majeffc, fig.cap="Major effects: the cohort effect (~ factor(year-age))">>=
wireframe(harvest(fit3), main='cohort effect')
@
\subsection{The submodel class and methods}
%====================================================================
% COLIN TO CHECK THE SECTION
%====================================================================
%
% Although the specification of each submodel is done through a R formula, internally the \aFa sca fit creates an object (of class 'submodel') which stores more information and allows a wide number of methods to be applied. The most important cases are prediction and simulation methods.
%
% The submodel class is a S4 class with the following slots:
%
% <<>>=
% showClass("submodel")
% @
%
% Objects of class 'submodel' are created in the fitting process using the formulas set by the user (or defaults if missing) and information from the fit.
%
% For example begining with a simple \aFa fit:
%
% <<>>=
% # fit a model with indices "IBTS_Q1" and "IBTS_Q3"
% fit0 <- sca(ple4, ple4.indices[c("IBTS_Q1", "IBTS_Q3")],
% fmodel = ~ s(age, k = 5),
% qmodel = list( ~ s(age, k = 4), ~ s(age, k = 4)),
% srmodel = ~ 1,
% n1model = ~ s(age, k = 5),
% vmodel = list( ~ 1, ~ 1, ~ 1),
% verbose = FALSE)
% fit0
% @
%
% Within the sca fit object there's a collection of submodels for each component of the \aFa stock assessment model (fishing mortality, survey catchability, stock recruitment relationship, initial population and the observation variance for catch numbers and survey indices). For readability the following example will use the fishing mortality submodel although the methods and models described can be applied to each of the 5 submodels.
%
% The submodel can be extracted using:
%
% <<>>=
% fmod <- fmodel(fit0)
% @
%
% inside this object is a formula, the coeficients estimated by the model and their covariance matrix.
%
% <<sim_pred_fmodel_coef, results = 'hide', eval=FALSE>>=
% coef(fmod)
% vcov(fmod)
% @
%
% it is also possible to get the data behind the fit and the design matrix required to get the fitted values
%
% <<results = 'hide', eval=FALSE>>=
% as.data.frame(fmod)
% getX(fmod)
% @
%
% this makes it possble to check the fit manually like this (note that the data is centered in the model, which is addressed by adding the centering after the prediction is made, see code below):
%
% <<eval=FALSE>>=
% dat <- as.data.frame(fmod)
% X <- getX(fmod)
% b <- coef(fmod)
% dat$fit <- exp(c(X %*% b))# + dat$data.centering)
%
% plot(fit ~ age, type = "l", data = dat[dat$year == 2016,],
% ylab = "Estimated fishing mortality at age",
% ylim = c(0, max(dat$fit)), las = 1)
% @
%
% or simulate from the fit by resampling based on the variance matrix of the coefficients, which is done by simulating many coefficients (here called `bsim`).
%
% <<eval=FALSE>>=
% # get the variance matrix of the coefficients and simulate
% Sigma <- vcov(fmod)[,,1]
% bsim <- mvrnorm(999, b, Sigma)
%
% fit_sim <- exp(X %*% bsim)
% dat$ci_lower <- apply(fit_sim, 1, quantile, p = 0.025)
% dat$ci_upper <- apply(fit_sim, 1, quantile, p = 0.975)
%
% plot(fit ~ age, type = "l", data = dat[dat$year == 2016,],
% ylab = "Estimated fishing mortality at age",
% ylim = c(0, max(dat$ci_upper)), las = 1)
% lines(ci_lower ~ age, data = dat[dat$year == 2016,], lty = 2)
% lines(ci_upper ~ age, data = dat[dat$year == 2016,], lty = 2)
% @
%
% The above code is to show how it is possible to use an FLa4a submodel to predict and simulate, and not intended to be used by the user. The recomended way to predict from a submodel and make simulations is to use the `genFLQuant` function (and for finer control, there is also a `simulate` function). The above example can be done using FLa4a functions as follows:
%
% <<eval=FALSE>>=
% fmod_fit <- genFLQuant(fmod)
% dat <- as.data.frame(fmod_fit)
%
% plot(data ~ age, type = "l", data = dat[dat$year == 2016,],
% ylab = "Estimated fishing mortality at age",
% ylim = c(0, max(dat$data)), las = 1)
% @
%
% or using the `ggplotFL` package:
%
% <<eval=FALSE>>=
% fmod_fit <- genFLQuant(fmod)
%
% ggplot(data = fmod_fit[,"2016"], aes(x = age, y = data)) +
% geom_point() + geom_line() +
% ylab("Estimated fishing mortality at age") +
% ylim(0, max(fmod_fit))
% @
%
% and the simulations and confidence intervals are easily computed:
%
% <<eval=FALSE>>=
% # simulate 999
% fmod_fit_sim <- genFLQuant(fmod, nsim = 999)
%
% # reduce to quantiles
% fmod_fit_sim <- quantile(fmod_fit_sim[,"2016"], c(0.025, 0.50, 0.975))
% @
%
% plotting can be done by converting to a data.frame, reshaping and using standard `ggplot2` functionality
%
% <<eval=FALSE>>=
% dat <-
% reshape(
% as.data.frame(fmod_fit_sim, drop=TRUE),
% timevar = "iter", idvar = "age", direction = "wide"
% )
%
% ggplot(data=dat, aes(x = age, y = `data.50%`)) +
% geom_ribbon(aes(ymin = `data.2.5%`, ymax = `data.97.5%`),
% fill="red", alpha = .15) +
% geom_point() + geom_line() +
% ylab("Estimated fishing mortality at age") +
% ylim(0, max(fmod_fit_sim))
% @
%
\section{The statistical catch-at-age stock assessment framework}\label{sec:sca}
The \aFa stock assessment framework is implemented in \R through the method \code{sca()}. The method call requires as a minimum a FLStock object and a FLIndices object, in which case the default submodels will be set by the method.
%Having described building blocks, basic formulations and effects available to build a submodel's model, it's important to look into specific formulations and relate them to commonly known representations. Note that although a large number of formulations are available for each submodel, the user must carefuly decide on the full stock assessment model being build and avoid over-paramerizing. Over-parametrization may lead to non-convergence, but may also end up not being very useful for prediction/forecasting, which is one of the main objectives of stock assessment.
<<>>=
data(ple4)
data(ple4.indices)
fit <- sca(ple4, ple4.indices)
stk <- ple4 + fit
plot(stk)
@
By calling the fitted object the default submodel formulas are printed in the console:
<<>>=
fit
@
To set specific submodels the user has to write the relevant R formula and include it in the call. The arguments for each submodel are self-explanatory: fishing mortality is 'fmodel', indices' catchability is 'qmodel', stock-recruitment is 'srmodel', observation variance is 'vmodel' and for initial year's abundance is 'n1model'. The following model comes closer to the official stock assessment of North Sea plaice, as such we'll name it $0$ and keep it for future comparisons:
For future referencing we'll start with a base fit to be used for future comparisons, named fit 0.
<<fit0>>=
@
<<>>=
fmod0 <- ~s(age, k=6)+s(year, k=10)+te(age, year, k=c(3,8))
qmod0 <- list(~s(age, k = 4), ~s(age, k = 3), ~s(age, k = 3)+year, ~s(age, k = 3), ~s(age, k = 4), ~s(age, k = 6))
srmod0 <- ~ s(year, k=20)
vmod0 <- list(~s(age, k=4), ~1, ~1, ~1, ~1, ~1, ~1, ~1)
n1mod0 <- ~ s(age, k=3)
fit0 <- sca(ple4, ple4.indices, fmodel=fmod0, qmodel=qmod0, srmodel=srmod0, n1model=n1mod0, vmodel=vmod0)
stk0 <- ple4 + fit0
plot(stk0)
@
As before by calling the fitted object submodels' formulas are printed in the console:
<<>>=
fit
@
The method \code{sca} has other arguments which may be set by the user:
\begin{description}
\item [covar:] a \code{FLQuant} with covariates;
\item [wkdir:] a folder (character) where the \ADMB files will be saved for posterior inspection by the user;
\item [verbose:] be more verbose (logical);
\item [fit:] type of fit (character),
\begin{itemize}
\item 'MP' runs the minimizer without trying to invert the hessian and as such doesn't return the covariance matrix of the parameters, normally used inside \MSE loops where parameter variance may not be relevant;
\item 'assessment' runs minimizer and inverts hessian, returns the covariance matrix of the estimated parameters and the convergence criteria set in \ADMB;
\item 'MCMC' runs \ADMB's MCMC fit
\end{itemize}
\item [center:] shall observations be centered before fitting (logical);
\item [mcmc:] \ADMB's MCMC arguments (character vector), must be paired with \code{fit="MCMC"}.
\end{description}
There are a set of methods for \aFa fit objects which help manipulating \code{sca()} results, namely:
\begin{description}
\item [+:] update the stock object with the fitted fishing mortalities, population abundance and catch in numbers at age;
\end{description}
\subsection{Fishing mortality submodel ($F_{ay}$)}
<<echo=FALSE>>=
data(ple4)
data(ple4.indices)
@
We will now take a look at some examples for $F$ models and the forms that we can get.
% A non-separable model, where we consider age and year to interact can be modeled using a smooth interaction term in the F model using a tensor product of cubic splines with the `te` method (`r fign('te1')`), again borrowed from [mgcv](http://cran.r-project.org/web/packages/mgcv/index.html).
%
% <<>>=
% fmod <- ~ te(age, year, k = c(4,20))
% fit <- sca(ple4, ple4.indices[1], fmod)
% @
%
% <<te1, fig.cap="Fishing mortality smoothed non-separable model">>=
% wireframe(harvest(fit), zlab="F")
% @
%
% In the last examples the fishing mortalities (Fs') are linked across age and time. What if we want to free up a specific age class because in the residuals we see a consistent pattern. This can happen, for example, if the spatial distribution of juveniles is disconnected to the distribution of adults. The fishery focuses on the adult fish, and therefore the the F on young fish is a function of the distribution of the juveniles and could deserve a specific model. This can be achieved by adding a component for the year effect on age 1 (`r fign('age1')`).
%
% <<>>=
% fmod <- ~ te(age, year, k = c(4,20)) + s(year, k = 5, by = as.numeric(age==1))
% fit <- sca(ple4, ple4.indices[1], fmod)
% @
%
% <<age1, fig.cap="Fishing mortality age-year interaction model with extra age 1 smoother.">>=
% wireframe(harvest(fit), zlab="F")
% @
%
\subsubsection{Separable model}
One of the most useful models for fishing mortality is one in which 'age' and 'year' effects are independent, that is, where the shape of the selection pattern does not change over time, but the overall level of fishing mortality do. Commonly called a 'separable model'.
A full separable model in \aFa is written using the \code{factor} function which converts age and year effects into categorical values, forcing a different coefficient to be estimated for each level of both effects. This model has \code{age x year} number of parameters.
<<>>=
fmod1 <- ~ factor(age) + factor(year)
fit1 <- sca(ple4, ple4.indices, fmodel=fmod1, fit="MP")
@
One can reduce the number of parameters and add dependency along both effects, although still keeping independence of each other, by using smoothers rather than \code{factor}. We'll use a (unpenalised) thin plate spline provided by package \href{http://cran.r-project.org/web/packages/mgcv/}{mgcv} method \code{s()}. We're using the North Sea Plaice data, and since it has 10 ages we will use a simple rule of thumb that the spline should have fewer than $\frac{10}{2} = 5$ degrees of freedom, and so we opt for 4 degrees of freedom. We will also do the same for year and model the change in $F$ through time as a smoother with 20 degrees of freedom.
<<>>=
fmod2 <- ~ s(age, k=4) + s(year, k=20)
fit2 <- sca(ple4, ple4.indices, fmodel=fmod2, fit="MP")
@
An interesting extension of the separable model is the 'double separable' where a third factor or smoother is added for the cohort effect.
<<>>=
fmod3 <- ~ s(age, k=4) + s(year, k=20) + s(as.numeric(year-age), k=10)
fit3 <- sca(ple4, ple4.indices, fmodel=fmod3, fit="MP")
@
Figures~\ref{fig:sep00} and \ref{fig:sep01} depicts the three models selectivities for each year. Each separable model has a single selectivity that changes it's overall scale in each year, while the double separable introduces some variability over time by modeling the cohort factor.
<<sep00, fig.cap="Selection pattern of separable models. Each line represents the selection pattern in a specific year. Independent age and year effects (factor), internally dependent age and year (smooth), double separable (double).">>=
flqs <- FLQuants(factor=harvest(fit1), smooth=harvest(fit2), double=harvest(fit3))
pset <- list(strip.background=list(col="gray90"))
xyplot(data~age|qname, groups=year, data=flqs, type="l", col=1, layout=c(3,1), ylab="fishing mortality", par.settings=pset)
@
<<sep01, fig.cap="Fishing mortality of separable models. Independent age and year effects (factor), internally dependent age and year (smooth), double separable (double).">>=
wireframe(data~age+year|qname, data=as.data.frame(flqs), layout=c(3,1))
@
\subsubsection{Constant selectivity for contiguous ages or years}
To set these models we'll use the method \code{replace()} to define which ages or years will be modelled together with a single coefficient. The following example shows \code{replace()} in operation. The dependent variables used in the model will be changed and attributed the same age or year, as such during the fit observations of those age or year with will be seen as replicates. One can think of it as sharing the same mean value, which will be estimated by the model.
<<>>=
age <- 1:10
# last age same as previous
replace(age, age>9, 9)
# all ages after age 6
replace(age, age>6, 6)
year <- 1950:2010
replace(year, year>2005, 2005)
@
In the $F$ submodel one can use this method to fix the estimation of $F$ in the plus group to be the same as in the last non-aggregated age.
<<>>=
fmod <- ~ s(replace(age, age>9, 9), k=4) + s(year, k=20)
fit <- sca(ple4, ple4.indices, fmod)
@
<<ctsselage, fig.cap="F-at-age fixed above age 9">>=
wireframe(harvest(fit), zlab="F")
@
Or estimate the average $F$ in the most recent years, instead of averaging after the assessment to compute the \emph{statu quo} selection pattern.
<<>>=
fmod <- ~ s(age, k=4) + s(replace(year, year>2013, 2013), k=20)
fit <- sca(ple4, ple4.indices, fmod)
@
<<ctsselyear, fig.cap="F-at-age fixed for the most recent 5 years">>=
wireframe(data~age+year, data=harvest(fit)[,ac(2010:2017)], screen=c(z=-130, y=0, x=-60), zlab="F")
@
\subsubsection{Time blocks selectivity}
To define blocks of data \code(sca()) uses the method \code{breakpts()}, which creates a factor from a vector with levels defined by the second argument.
<<>>=
year <- 1950:2010
# two levels separated in 2000
breakpts(year, 2000)
# five periods with equal interval
breakpts(year, seq(1949, 2010, length=6))
@
Note \code{seq()} computes 'left-open' intervals, which means that to include 1950 the sequence must start one year earlier.
These methods can be used to create discrete time series, for which a different selection pattern is allowed in each block. This is called an interaction in statistical modelling parlance, and typically a \code{*} denotes an interaction term; for smoothers an interaction is achieved using the \code{by} argument. When this argument is a \code{factor} a replicate of the smooth is produced for each factor level.
In the next case we'll use the \code{breakpts()} to split the time series at 1990, although keeping the same shape in both periods, a thin plate spline with 3 knots (Figure~\ref{fig:brk}).
<<>>=
fmod <- ~s(age, k = 3, by = breakpts(year, 1990))
fit <- sca(ple4, ple4.indices, fmod)
@
<<brk, echo=FALSE, fig.cap="F-at-age in two periods using in both cases a thin plate spline with 3 knots">>=
wireframe(harvest(fit), zlab="F")
@
\subsubsection{Time changing selectivity}
In many cases, it may be desirable to allow the selection pattern to evolve over time, from year to year. Again there are several ways to do this, one way is to estimate a mean selection pattern, while also allowing F to vary over time for each age. This is like a seperate smoother over year, with 'age blocks' so, looking back at previous examples, we have:
<<>>=
fmodel <- ~ s(year, k = 15, by = factor(age)) + s(age, k = 4)
@
This is a type of interaction between age and year, but the only connection (or correlation) across ages is via the smoother on age, however there are still 15 degrees of freedom for each age, so the model 5 x 15 + 4 = 69 degrees of freedom. To include correlation across ages and years together then the tensor product (`te()` function) is used, this has the effect of restricting the flexibility of the model for F. In the following, there is a smoother in 2 dimensions (age and year) where there is 5 degrees of freedom in the age direction, and 15 in the year dimension, resulting in a total of 5 x 15 = 65 degrees of freedom
<<>>=
fmodel <- ~ te(age, year, k = c(5, 15))
@
Often the above formulations provide too much flexibility, and a more complicated, but simpler model is preferable:
<<>>=
fmodel <- ~ s(age, k = 4) + s(year, k = 15) + te(age, year, k = c(3, 5))
@
in the above model, the main effects for age and year still have similar flexibility to the full tensor model, however, the interaction (or the change in F at age over time) has been restricted, so that the full model now has 4 + 15 + 3 x 5 = 34 degrees of freedom.
\subsubsection{Trawl fleets}
\subsubsection{Nets and Liners fleets}
\subsubsection{Multigear fleets}
\subsubsection{Trawl surveys}
\subsubsection{Closed form selection pattern}
One can use a closed form for the selection pattern. The only requirement is to be able to write it as a \R formula, the example below uses a logistic form.
<<>>=
fmod <- ~ I(1/(1+exp(-age)))
fit <- sca(ple4, ple4.indices, fmod)
@
<<logistic, fig.cap="F-at-age logistic">>=
wireframe(harvest(fit), zlab="F")
@
% \subsubsection{More models}
%
% More complicated models can be built with these tools. For example, `r fign('ageind')` shows a model where the age effect is modelled as a smoother (the same thin plate spline) throughout years but independent from each other.
%
% <<>>=
% fmod <- ~ factor(age) + s(year, k=10, by = breakpts(age, c(2:8)))
% fit <- sca(ple4, ple4.indices, fmod)
% @
%
% <<ageind, fig.cap="F-at-age as thin plate spline with 3 knots for each age">>=
% wireframe(harvest(fit), zlab="F")
% @
%
% A quite complex model that implements a cohort effect can be set through the following formula. `r fign('coh')` shows the resulting fishing mortality. Note that in this case we end up with a variable F pattern over time, but rather than using 4 * 10 = 40 parameters, it uses, 4 + 10 + 10 = 24.
%
% <<>>=
% fmodel <- ~ s(age, k = 4) + s(pmax(year - age, 1957), k = 10) + s(year, k = 10)
% fit <- sca(ple4, ple4.indices, fmodel=fmodel)
% @
%
% <<coh, echo=FALSE, fig.cap="F-at-age with a cohort effect.">>=
% wireframe(harvest(fit), zlab="F")
% @
%
\subsection{Abundance indices catchability submodel ($Q_{ays}$)}
The catchability submodel is set up the same way as the $F$ submodel and the tools available are the same. The only difference is that the submodel is set up as a list of formulas, where each formula relates with one abundance index. There's no limitation in the number of indices or type that can be used for a fit. It's the analyst that has to decide based on her/his expertise and knowledge of the stock and fleet dynamics.
\subsubsection{Catchability submodel for age based indices}
A first model is simply a dummy effect on age, which means that a coefficient will be estimated for each age. Note that this kind of model considers that levels of the factor are independent (Figure~\ref{fig:dummyage}).
<<dummyage, fig.cap="Catchability age independent model">>=
qmod <- list(~factor(age))
fit <- sca(ple4, ple4.indices[1], qmodel=qmod)
qhat <- predict(fit)$qmodel[[1]]
wireframe(qhat, zlab="q")
@
If one considers catchability at a specific age to be dependent on catchability on the other ages, similar to a selectivity modelling approach, one option is to use a smoother at age, and let the data 'speak' regarding the shape (Figure~\ref{fig:smoothage}).
<<smoothage, fig.cap="Catchability smoother age model">>=
qmod <- list(~ s(age, k=4))
fit <- sca(ple4, ple4.indices[1], qmodel=qmod)
qhat <- predict(fit)$qmodel[[1]]
wireframe(qhat, zlab="q")
@
Finally, one may want to investigate a trend in catchability with time, very common in indices built from CPUE data. In the example given here we'll use a linear trend in time, set up by a simple linear model (Figure~\ref{fig:qtrend}).
<<qtrend, fig.cap="Catchability with a linear trend in year">>=
qmod <- list( ~ s(age, k=4) + year)
fit <- sca(ple4, ple4.indices[1], qmodel=qmod)
qhat <- predict(fit)$qmodel[[1]]
wireframe(qhat, zlab="q")
@
\subsubsection{Catchability submodel for age aggregated biomass indices}
The previous section was focused on age disaggregated indices, but age aggregated indices (CPUE, biomass, DEPM, etc) may also be used to tune the total biomass of the population. In these cases a different class for the index must be used, the \code{FLIndexBiomass}, which uses a vector \code{index} with the age dimension called 'all'. Note that in this case the qmodel should be set without age factors, although it can have a 'year' component and covariates if needed. An interesting feature with biomass indices is the age range they refer to can be specified.
<<>>=
# simulating a biomass index (note the name of the first dimension element) using
# the ple4 biomass and an arbritary catchability of 0.001 plus a lognormal error.
dnms <- list(age="all", year=range(ple4)["minyear"]:range(ple4)["maxyear"])
bioidx <- FLIndexBiomass(FLQuant(NA, dimnames=dnms))
index(bioidx) <- stock(ple4)*0.001
index(bioidx) <- index(bioidx)*exp(rnorm(index(bioidx), sd=0.1))
range(bioidx)[c("startf","endf")] <- c(0,0)
# note the name of the first dimension element
index(bioidx)
# fitting the model
fit <- sca(ple4, FLIndices(bioidx), qmodel=list(~1))
@
To estimate a constant selectivity over time one used the model \code{\midtilde 1}. As a matter of fact the estimate value, \Sexpr{round(predict(fit)$qmodel[[1]][1,drop=TRUE],5)}, is not very far from the simulated one, 0.001.
An example where the biomass index refers only to age 2 to 4 (for example a CPUE that targets these particular ages).
<<>>=
# creating the index
dnms <- list(age="all", year=range(ple4)["minyear"]:range(ple4)["maxyear"])
bioidx <- FLIndexBiomass(FLQuant(NA, dimnames=dnms))
# but now use only ages 2:4
index(bioidx) <- tsb(ple4[ac(2:4)])*0.001
index(bioidx) <- index(bioidx)*exp(rnorm(index(bioidx), sd=0.1))
range(bioidx)[c("startf","endf")] <- c(0,0)
# to pass this information to the model one needs to specify an age range
range(bioidx)[c("min","max")] <- c(2,4)
# fitting the model
fit <- sca(ple4, FLIndices(bioidx), qmodel=list(~1))
@
Once more the estimate value, \Sexpr{round(predict(fit)$qmodel[[1]][1,drop=TRUE],5)}, is not very far from the simulated one, 0.001.
\subsubsection{Catchability submodel for single age indices}
Similar to age aggregated indices one may have an index that relates only to one age, like a recruitment index. In this case the \code{FLIndex} object must have in the first dimension the age it referes to. The fit is then done relating the index with the proper age in numbers. Note that in this case the qmodel should be set without age factors, although it can have a 'year' component and covariates if needed.
<<>>=
idx <- ple4.indices[[1]][1]
fit <- sca(ple4, FLIndices(recidx=idx), qmodel=list(~1))
# the estimated catchability is
predict(fit)$qmodel[[1]]
@
\subsection{Stock-recruitment submodel ($R_y$)}
The S/R submodel is a special case, in the sense that it can be set up with the same linear tools as the $F$ and $Q$ models, but it can also use some hard coded models. The example shows how to set up a simple dummy model with \code{factor()}, a smooth model with \code{s()}, a Ricker model (\code{ricker()}), a Beverton and Holt model (\code{bevholt()}), a hockey stick model (\code{hockey()}), and a geometric mean model (\code{geomean()}). See Figure~\ref{fig:srmod} for results. As mentioned before, the 'structural' models have a fixed variance, which must be set by defining the coefficient of variation.
<<>>=
srmod <- ~ factor(year)
fit <- sca(ple4, ple4.indices, srmodel=srmod)
srmod <- ~ s(year, k=10)
fit1 <- sca(ple4, ple4.indices, srmodel=srmod)
srmod <- ~ ricker(CV=0.05)
fit2 <- sca(ple4, ple4.indices, srmodel=srmod)
srmod <- ~ bevholt(CV=0.05)
fit3 <- sca(ple4, ple4.indices, srmodel=srmod)
srmod <- ~ hockey(CV=0.05)
fit4 <- sca(ple4, ple4.indices, srmodel=srmod)
srmod <- ~ geomean(CV=0.05)
fit5 <- sca(ple4, ple4.indices, srmodel=srmod)
@
<<srmod, fig.cap="Stock-recruitment models fits">>=
flqs <- FLQuants(factor=stock.n(fit)[1], smother=stock.n(fit1)[1], ricker=stock.n(fit2)[1], bevholt=stock.n(fit3)[1], hockey=stock.n(fit4)[1], geomean=stock.n(fit5)[1])
xyplot(data~year, groups=qname, data=flqs, type="l", auto.key=list(points=FALSE, lines=TRUE, columns=3), ylab="No. recruits")
@
\subsection{Observation variance submodel ($\{\sigma^2_{ay}, \tau^2_{ays}\}$)}
The variance model allows the user to set up the shape of the observation variances $\sigma^2_{ay}$ and $\tau^2_{ays}$. This is an important subject related with fisheries data used for input to stock assessment models.
The defaults assume a U-shape model for catch-at-age and constant variance for abundance indices. The first relies on the fact that it's common to have more precision on the most represented ages and less precision on the less frequent ages which tend to be the younger and older individuals. These sizes are less caught by the fleets and as such do not appear as often at the auction markets samples. With regards to the abundance indices, one assumes a scientific survey to have a well designed sampling scheme and protocols which keep observation error at similar levels across ages.
<<>>=
vmod <- list(~s(age, k=3), ~1)
fit1 <- sca(ple4, ple4.indices[1], vmodel=vmod)
vmod <- list(~s(age, k=3), ~s(age, k=3))
fit2 <- sca(ple4, ple4.indices[1], vmodel=vmod)
@
Variance estimated for the constant model is \Sexpr{round(predict(fit)$vmodel[[2]][1,drop=TRUE],3)} while for the U-shape model, fitted with a smoother, changes with ages (Figure~\ref{fig:vmod}).
<<vmod, fig.cap="Abundance index observation variance estimate">>=
wireframe(predict(fit2)$vmodel[[2]], zlab="variance")
@
Observation variance options have an impact in the final estimates of population abundance, which can be seen in Figure~\ref{fig:vmodimpact}.
<<vmodimpact, fig.cap="Population estimates using two different variance models">>=
flqs <- FLQuants(smother=stock.n(fit1), factor=stock.n(fit2))
xyplot(data~year|age, groups=qname, data=flqs, type="l",
scales=list(y=list(relation="free", draw=FALSE)),
auto.key=list(points=FALSE, lines=TRUE, columns=2),
par.settings=list(superpose.line=list(col=c("gray35", "black")),
strip.background=list(col="gray90")), ylab="")
@
\subsection{Initial year abundance submodel ($N_{a,y=1}$)}
The submodel for the stock number at age in the first year of the time series is set up with the usual modelling tools (Figure~\ref{fig:ny1}). Beare in mind that the year effect does not make sense here since it refers to a single year, the first in the time series of data available. This model has its influence limited to the initial lower triangle of the population matrix, which in assessments with long time series doesn't make much difference. Nevertheless, when modelling stocks with short time series in relation to the number of ages present, it becomes more important and should be given proper attention.
<<>>=
n1mod <- ~s(age, k=3)
fit1 <- sca(ple4, ple4.indices, fmodel=fmod0, qmodel=qmod0, srmodel=srmod0, vmodel=vmod0, n1model=n1mod)
n1mod <- ~factor(age)
fit2 <- sca(ple4, ple4.indices, fmodel=fmod0, qmodel=qmod0, srmodel=srmod0, vmodel=vmod0, n1model=n1mod)
flqs <- FLQuants(smother=stock.n(fit1)[,1], factor=stock.n(fit2)[,1])
@
<<ny1, fig.cap="Nay=1 models">>=
pset <- list(superpose.line=list(col=c("gray50", "black"), lty=c(1,2)))
xyplot(data~age, groups=qname, data=flqs, type="l", auto.key=lgnd, par.settings=pset, ylab="")
@
The impact in the overall perspective of the stock status is depicted in Figure~\ref{fig:n1modimpact}. As time goes by the effect of this model vanishes and the fits become similar.
<<n1modimpact, fig.cap="Population estimates using two different variance models">>=
flqs <- FLQuants(smother=stock.n(fit1), factor=stock.n(fit2))
pset$strip.background <- list(col="gray90")
scl <- list(y=list(relation="free", draw=FALSE))
xyplot(data~year|factor(age), groups=qname, data=flqs, type="l", scales=scl, auto.key=lgnd, par.settings=pset, ylab="")
@
\subsection{Data weigthing}
%====================================================================
% COLIN TO CHECK THE SECTION
%====================================================================
By default the likelihood components are not weighted and the contribution of each to the maximum likelihood depends on their own likelihood score. However, the user may change these weights by penalizing data points, the $w_{ays}$ in section~\ref{sec:maths}. The likelihood score of each data point will be multiplied by the normalized weights ($\sum w_{ays} = 1$). This is done by adding a variance matrix to the \code{catch.n} and \code{index.n} slots of the stock and index objects. The values should be given as coefficients of variation on the log scale, so that variance is $\log{({CV}^2 + 1)}$. Figures Figure~\ref{fig:likwgt} and \ref{fig:likwgtimpact} show the results of the two fits in the population abundance and stock summary.
<<>>=
stk <- ple4
idx <- ple4.indices[1]
# cv of observed catches
varslt <- catch.n(stk)
varslt[] <- 0.4
catch.n(stk) <- FLQuantDistr(catch.n(stk), varslt)
# cv of observed indices
varslt <- index(idx[[1]])
varslt[] <- 0.1
index.var(idx[[1]]) <- varslt
# run
fit1 <- sca(stk, idx, fmodel=fmod0, qmodel=qmod0, srmodel=srmod0, vmodel=vmod0, n1model=n1mod0)
flqs <- FLQuants(nowgt=stock.n(fit0), extwgt=stock.n(fit1))
@
<<likwgt, fig.cap="Stock summary of distinct likelihood weightings">>=
xyplot(data~year|factor(age), groups=qname, data=flqs, type="l", scales=scl, auto.key=lgnd, par.settings=pset, ylab="")
@
<<likwgtimpact, fig.cap="Population estimates using two different variance models">>=
flsts <- FLStocks(nowgt=ple4+fit0, wgt=ple4 + fit1)
plot(flsts)
@
Note that by using a smaller CV for the index, one is increasing the contribution of the survey and penalizing catch at age, in relative terms. The ratio between likelihood scores of both fits show this effect with catch at age increasing by 2.3 while the index increases almost 8 fold.
<<>>=
fit0 <- sca(ple4, ple4.indices[1], fmodel=fmod0, qmodel=qmod0, srmodel=srmod0, vmodel=vmod0, n1model=n1mod0)
(fitSumm(fit1)/fitSumm(fit0))[c(2,8,9),]
@
\subsection{Working with covariates}
In linear model one can use covariates to explain part of the variance observed on the data that the 'core' model does not explain. The same can be done in the \aFa framework. The example below uses the North Atlantic Oscillation (NAO) index to model recruitment.
<<>>=
nao <- read.table("https://www.cpc.ncep.noaa.gov/products/precip/CWlink/pna/norm.nao.monthly.b5001.current.ascii.table", skip=1, fill=TRUE, na.strings="-99.90")
dnms <- list(quant="nao", year=1950:2024, unit="unique", season=1:12, area="unique")
nao <- FLQuant(unlist(nao[,-1]), dimnames=dnms, units="nao")
nao <- seasonMeans(trim(nao, year=dimnames(stock.n(ple4))$year))
@
First by simply assuming that the NAO index drives recruitment (`r fign('naor')`).
<<>>=
srmod <- ~ nao
fit2 <- sca(ple4, ple4.indices[1], qmodel=list(~s(age, k=4)), srmodel=srmod, covar=FLQuants(nao=nao))
flqs <- FLQuants(simple=stock.n(fit)[1], covar=stock.n(fit2)[1])
@
<<naor, echo=FALSE, fig.cap="Recruitment model with covariates. Using the NAO index as a recruitment index.">>=
xyplot(data~year, groups=qname, data=flqs, type="l",
auto.key=list(points=FALSE, lines=TRUE, columns=2),
par.settings=list(superpose.line=list(col=c("gray35", "black"), lty=c(2,1), lwd=c(1,1.5)),
strip.background=list(col="gray90")), ylab="")
@
In a second model we're using the NAO index not to model recruitment directly but to model one of the parameters of the S/R function (`r fign('naor2')`).
<<>>=
srmod <- ~ ricker(a=~nao, CV=0.25)
fit3 <- sca(ple4, ple4.indices[1], qmodel=list(~s(age, k=4)), srmodel=srmod, covar=FLQuants(nao=nao))
flqs <- FLQuants(simple=stock.n(fit)[1], covar=stock.n(fit3)[1])
@
<<naor2, echo=FALSE, fig.cap="Recruitment model with covariates. Using the NAO index as a covariate for the stock-recruitment model parameters.">>=
xyplot(data~year, groups=qname, data=flqs, type="l",
auto.key=list(points=FALSE, lines=TRUE, columns=2),
par.settings=list(superpose.line=list(col=c("gray35", "black"), lty=c(2,1), lwd=c(1,1.5)),
strip.background=list(col="gray90")), ylab="")
@
Note that covariates can be added to any submodel using the linear model capabilities of R.
\subsection{Assessing \ADMB files}
The framework gives access to the files produced to run the \ADMB fitting routine through the argument \code{wkdir}. When set up all the \ADMB files will be left in the directory. Note that the \ADMB tpl file is distributed with the \code{FLa4a}. One can get it from your \R library, under the folder \code{myRlib/FLa4a/admb/}.
<<eval=FALSE>>=
fit1 <- sca(ple4, ple4.indices, wkdir="fit1run")
@
\subsection{Missing observations in the catch matrix or index}
%====================================================================
% COLIN TO CHECK THE SECTION
%====================================================================
How are the data interpolated etc ...
\section{Diagnostics}\label{sec:diagn}