The sport_harvest_estimator is designed to present the best available data in ways that encourage a structured, quantitative consideration of the uncertainties of fisheries planning. It facilitates the rapid examination of how a set of assumptions about “average” effort and rates of coho retention do or do not align with expectations, as well as whether these assumptions yield values that maintain or depart from past preseason FRAM inputs and post-season harvest estimates. This process enables managers and analysts to better quantify the potential consequences of various real-world processes affecting sport fisheries (e.g., spatio-temporal effort shifts, regulation shifts).

This analysis generates results following from pre-specified, “blind” or “naive” applications of the method, thereby excluding the data exploration that is one of the primary goals of the approach. However, this mode of assessment nonetheless offers insight into some of the strengths and limitations of the method.

Bringing data to bear as quickly as possible was another key motivation for development of the sport_harvest_estimator, and this assessment first presents results relative to the most recent 2021 season for Area 5, where post-season estimates have just been completed (this also suggests how the method is suited to in-season use). First, a “default” application of the method is tested for the ability to reproduce final estimates given 2021 data. It is then demonstrated in an “as-applied” mode, given the data that would have been available during the 2021 North of Falcon (NOF) process.

1 Area 5 2021, reproducing 2021 given 2021 dockside

The table below illustrates use of the median coho-kept-per-angler-per-day (hapd) and the sample-rate expanded median anglers-per-day (apd) for only the 2021 dockside creel data. The harvest estimation method result (column pred) is shown relative to the final NOF2021 coho FRAM inputs (pre) and the recently completed 2021 post-season estimate (pst). The pst-pre and pst-pred show the respective differences against the post-season estimate. Values in the final column are shaded light green if the new method improves in absolute error and darker green if it both improves in absolute error and does not underestimate catch.

This test indicates that the central tendencies of the sampled data can generate values closer to post-season estimates than were developed in the preseason, but it does not demonstrate preseason skill. In addition, the considerable remaining error in timestep 3 (August) suggests the scope for continued improvements.

left_join(
  #dockside median anglers-per-day and coho kept-per-angler-per-day
  pssp |> 
    filter(area_code == "05", yr == "2021", between(ts, 2, 4)) |> 
    group_by(area_code, yr, ts, wkend) |> 
    summarise(across(c(anglers, coho_kpa_ad), median), .groups = "drop")
  ,
  #per timestep median sampling rates
  rmis_cs |> select("area_code", "year", "yr", "ts", "sr") |> 
    filter(area_code == "05", between(year, 2014, 2020)) |> 
    group_by(area_code, ts) |> 
    summarise(across(c(sr), median), .groups = "drop")
  ,
  by = c("area_code", "ts")
) |> 
  mutate(
    apd_expnd = anglers / sr,
    df = case_when(
      wkend == "wkday" & ts %in% 2:3 ~ 18,
      wkend == "wkend" & ts %in% 2:3 ~ 13,
      wkend == "wkday" & ts == 4 ~ 18,
      wkend == "wkend" & ts == 4 ~ 12
    ),
    pred = df * apd_expnd * coho_kpa_ad
  ) |> 
  group_by(area_code, yr, ts) |> 
  summarise(pred = sum(pred), .groups = "drop") |> 
  left_join(a5_21, by = c("area_code", "yr", "ts")) |> 
  mutate(
    `pst-pre` = pst - pre,
    `pst-pred` = pst - pred
  ) |> 
  gt() |> 
  fmt_integer(columns = -c(area_code, yr)) |> 
  tab_style(
    style = cell_fill("#B9FAC4"),
    locations = cells_body(
      columns = `pst-pred`,
      rows = abs(`pst-pred`) < abs(`pst-pre`)
    )
  ) |> 
  tab_style(
    style = cell_fill("#39D155"),
    locations = cells_body(
      columns = `pst-pred`,
      rows = (abs(`pst-pred`) < abs(`pst-pre`)) & `pst-pred` <= 0
    )
  ) |> 
  gt::tab_header(title = "reproducing 2021 given 2021 dockside obs") 
reproducing 2021 given 2021 dockside obs
area_code yr ts pred pre pst pst-pre pst-pred
05 2021 2 418 1,796 433 −1,363 15
05 2021 3 948 5,326 2,827 −2,499 1,879
05 2021 4 15,128 9,356 16,337 6,981 1,209

2 Area 5 2021, As-applied NOF2021, given 2014-2020 dockside

The table in this section replaces the single year of 2021 data with that from 2014-2020 which would have been available during NOF2021, but otherwise uses similar distributional assumptions to the preceding section (pooled year median hapd and sample-rate expanded median apd; note that 2016 and 2017 data are not available due to closures and that older mixed regulations prompted this set of years). The table also follows the conventions, with harvest estimation method result in column pred alongside the final NOF2021 coho FRAM inputs pre and the 2021 post-season estimate pst. As above, the pst-pre and pst-pred show the respective differences against the post-season estimate, with values in the final column shaded light green if the new method improves in absolute error and darker green if it both improves in absolute error and does not underestimate catch.

In this more realistic test, the new method outperforms the existing approach in July and September. Performance for both approaches was similarly bad for the August timestep, but this application of the new method did not produce an improvement.

left_join(
  #dockside median anglers-per-day and coho kept-per-angler-per-day
  pssp |> 
    filter(area_code == "05", yr %in% as.character(2014:2020), between(ts, 2, 4)) |> 
    group_by(area_code, yr, ts, wkend) |> 
    summarise(across(c(anglers, coho_kpa_ad), median), .groups = "drop"),
  #per timestep median sampling rates
  rmis_cs |> filter(area_code == "05", between(year, 2014, 2020)),
  by = c("area_code", "yr","ts")
  ) |> 
  mutate(apd_expnd = anglers / sr) |>
  #generate the pooled medians across years
  group_by(area_code, ts, wkend) |> 
  summarise(across(c(apd_expnd, coho_kpa_ad), median), .groups = "drop") |> 
  mutate(
    yr = "2021",
    df = case_when(
      wkend == "wkday" & ts %in% 2:3 ~ 18,
      wkend == "wkend" & ts %in% 2:3 ~ 13,
      wkend == "wkday" & ts == 4 ~ 18,
      wkend == "wkend" & ts == 4 ~ 12
      ),
    pred = df * apd_expnd * coho_kpa_ad
  ) |> 
  group_by(area_code, yr, ts) |> 
  summarise(pred = sum(pred), .groups = "drop") |> 
  left_join(a5_21, by = c("area_code", "yr", "ts")) |> 
  mutate(
    `pst-pre` = pst - pre,
    `pst-pred` = pst - pred
  ) |> 
  gt() |> 
  fmt_integer(columns = -c(area_code, yr)) |> 
  tab_style(
    style = cell_fill("#B9FAC4"),
    locations = cells_body(
      columns = `pst-pred`,
      rows = abs(`pst-pred`) < abs(`pst-pre`)
    )
  ) |> 
  tab_style(
    style = cell_fill("#39D155"),
    locations = cells_body(
      columns = `pst-pred`,
      rows = (abs(`pst-pred`) < abs(`pst-pre`)) & `pst-pred` <= 0
    )
  ) |> 
  gt::tab_header(title = "As-applied 2021, from 2014-2020 pooled medians") 
As-applied 2021, from 2014-2020 pooled medians
area_code yr ts pred pre pst pst-pre pst-pred
05 2021 2 995 1,796 433 −1,363 −562
05 2021 3 5,681 5,326 2,827 −2,499 −2,854
05 2021 4 10,921 9,356 16,337 6,981 5,416

3 Areas 5-11, as-applied NOF2020, given 2014-2019 dockside

This section extends the previous concept to additional areas, reducing the data to a 2014-2019 window that would have been available during NOF2020 and comparing the actual preseason pre and potential pred to the values in the 2020 CoTC FRAM postseason run (as are made available to the user in the tool). The distributional assumptions and table conventions remain consistent.

This test demonstrates improved performance across these areas during the important September timestep 4, with the exception of A10. While the 2020-only dockside values do closely reproduce the postseason estimate (similar to the first section above, not shown), this outcome points to the need for further scrutiny in this case. Performance in this is application is more mixed for earlier summer time steps 2 (July) and 3 (August), although several of the differences in errors are well within the range of other sources of model error. Perhaps more importantly, some instances of an apparent lack of performance improvement, such as A7 T3, arguably reflect the constraints of this analysis rather than a limitation of the tool. The post-season value is substantially underestimated in both preseason columns, but the harvest estimator method provides a clear depiction of recent trends in pre-post performance for this area-timestep that would prompt the use of a higher distributional moment in actual application.

left_join(
  #dockside median anglers-per-day and coho kept-per-angler-per-day
  pssp |> 
    filter(area_code %in% c("05","06","07","09","10","11"), yr %in% as.character(2014:2019), between(ts, 2, 4)) |> 
    group_by(area_code, yr, ts, wkend) |> 
    summarise(across(c(anglers, coho_kpa), median), .groups = "drop"),
  #per timestep median sampling rates
  rmis_cs,
  by = c("area_code", "yr","ts")
  ) |>
  #exclude NA sample rates rather than try to replicate dynamic median generation
  filter(!is.na(sr)) |> 
  mutate(apd_expnd = anglers / sr) |>
  #generate the pooled medians across years
  group_by(area_code, ts, wkend) |> 
  summarise(across(c(apd_expnd, coho_kpa), median), .groups = "drop") |> 
  mutate(
    yr = "2020",
    df = case_when(
      wkend == "wkday" & ts %in% 2:3 ~ 18,
      wkend == "wkend" & ts %in% 2:3 ~ 13,
      wkend == "wkday" & ts == 4 ~ 18,
      wkend == "wkend" & ts == 4 ~ 12
    ),
    pred = df * apd_expnd * coho_kpa
  ) |> 
  group_by(area_code, yr, ts) |> 
  summarise(pred = sum(pred), .groups = "drop") |> 
  left_join(
    fs_ps_spt |> select(type, area_code, yr, ts, val),
    by = c("area_code", "yr", "ts")
  ) |> 
  pivot_wider(names_from = type, values_from = val) |> 
  mutate(
    `pst-pre` = pst - pre,
    `pst-pred` = pst - pred
  ) |> 
  gt(rowname_col = 'area_code') |> 
  cols_hide("yr") |> 
  fmt_integer(columns = -c(area_code, yr)) |> 
  tab_style(
    style = cell_fill("#B9FAC4"),
    locations = cells_body(
      columns = `pst-pred`,
      rows = abs(`pst-pred`) < abs(`pst-pre`)
    )
  ) |> 
  tab_style(
    style = cell_fill("#39D155"),
    locations = cells_body(
      columns = `pst-pred`,
      rows = (abs(`pst-pred`) < abs(`pst-pre`)) & `pst-pred` <= 0
    )
  ) |> 
  gt::tab_header(title = "As-applied 2020, from 2014-2019 pooled medians") 
As-applied 2020, from 2014-2019 pooled medians
ts pred pre pst pst-pre pst-pred
05 2 712 2,334 1,679 −655 967
05 3 4,113 5,628 5,054 −574 941
05 4 10,731 6,537 10,055 3,518 −676
06 2 157 109 112 3 −45
06 3 131 650 883 233 752
06 4 4,009 972 4,289 3,317 280
07 2 0 76 28 −48 28
07 3 157 450 1,498 1,048 1,341
07 4 2,875 1,605 9,301 7,696 6,426
09 2 859 650 1,700 1,050 841
09 3 2,074 2,512 2,190 −322 116
09 4 8,840 14,005 2,722 −11,283 −6,118
10 2 2,520 3,022 6,064 3,042 3,544
10 3 2,287 3,380 5,225 1,845 2,938
10 4 10,347 13,221 13,935 714 3,588
11 2 102 184 359 175 257
11 3 761 775 1,005 230 244
11 4 1,455 1,397 2,233 836 778

4 Conclusion

This analysis has shown that in a “default” mode, uninformed by managers’ experience and expectations for a given year, the harvest estimator method was capable of generating more accurate coho FRAM inputs. However, performance gains were not uniform across areas and timesteps, and the comparison underscored the importance of ongoing refinement. This approach is well-positioned to take advantage of increasingly rich datasets as additional monitoring investments are made in Puget Sound recreational coho fisheries.