Sometimes a visualisation is easier to read when using log2, log10 or log scale instead of the original scale, especially when the values are very large or skewed. However the y-axis labels will also be converted to the log scales. The functions here are developed for converting back to the original scale for readability.

The following options are included in fhiplot package:

  • log 2 scale: format_nor_invlog2_1, format_nor_invlog2_2
  • log 10 scale: format_nor_invlog10_1, format_nor_invlog10_2
  • natural log (e) scale: format_nor_invloge_1, format_nor_invloge_2

For instance, if you have plotted \(y = log2(x)\), format_nor_invlog2_1 formats the numerical values \(y\) on the y-axis by transforming \(x = 2^y\), then it keeps 1 digit after decimal point.

Example: excess mortality

library(ggplot2)
library(fhiplot)
#> fhiplot 2022.2.22 https://folkehelseinstituttet.github.io/fhiplot
library(magrittr)

Simulate fictional data

We simulate some fictional data for demonstration. We simulate 2-year weekly data of mortality cases in two locations, county A and county B. The count of cases follow the Poisson distribution with different \(\lambda\). For county A \(\lambda_{observed} = 1000, \lambda_{expected} = 500\). For county B \(\lambda_{observed} = 1000, \lambda_{expected} = 2000\).

yrwk_19_20 <- fhidata::world_dates_isoyearweek[year %in% c(2019, 2020),
                                               .(year, yrwk)]

simulate_case <- function(yrwk_df, lambda_expected, lambda_observed, location_code){
  nweeks <- nrow(yrwk_df)
  n_observed <- rpois(nweeks, lambda_observed)
  n_expected <- rpois(nweeks, lambda_expected)

  df <- data.frame(deaths_n_observed = n_observed,
                   deaths_n_expected = n_expected,
                   location_code = location_code)
  df <- cbind(yrwk_df, df)
  return(df)
}


set.seed(1)
df_loc1 <- simulate_case(yrwk_df = yrwk_19_20,
                         lambda_expected = 1000,
                         lambda_observed = 500,
                         location_code = "county_A")
df_loc2 <- simulate_case(yrwk_df = yrwk_19_20,
                         lambda_expected = 1000,
                         lambda_observed = 2000,
                         location_code = "county_B")

dd <- rbind(df_loc1, df_loc2)
dd
#>      year    yrwk deaths_n_observed deaths_n_expected location_code
#>   1: 2019 2019-01               485               998      county_A
#>   2: 2019 2019-02               529               978      county_A
#>   3: 2019 2019-03               528              1047      county_A
#>   4: 2019 2019-04               509               984      county_A
#>   5: 2019 2019-05               465              1016      county_A
#>  ---                                                               
#> 206: 2020 2020-49              2027               980      county_B
#> 207: 2020 2020-50              2024              1007      county_B
#> 208: 2020 2020-51              2040               986      county_B
#> 209: 2020 2020-52              2119               972      county_B
#> 210: 2020 2020-53              1991              1037      county_B

First we visualise the fictional observed death over 2 years.

# separate the yrwk string to produce week id

dd$week <- fhiplot::isoyearweek_to_week_n(dd$yrwk)
dd$year <- as.factor(dd$year)

# plot cases
q <- ggplot(dd, aes(x = week, y = deaths_n_observed, color = year))
q <- q + geom_line(lwd = 1.3)
q <- q + facet_wrap(~ location_code, nrow = 2, scale = 'free')
# use fhi themes
q <- q + fhiplot::theme_fhi_basic(10)
q <- q + fhiplot::scale_color_fhi("", palette = "primary", direction = -1)
q <- q + fhiplot::theme_fhi_lines_horizontal()
q <- q + guides(color = guide_legend(reverse = FALSE))
q <- q + labs(title = "Observed deaths (simulated data)", 
              x = "Week",
              y = "Number of death cases")
q

Cumulative excess mortality and ratio

We can compute the cumulative excess mortality per year, as well as the ratio of cumulative observed deaths and expected deaths for each location, each year.

# compute excess and ratio by year
dd[, cum_excess := cumsum(deaths_n_observed - deaths_n_expected),
   by = list(location_code, year)]
dd[, cum_excess_ratio := cumsum(deaths_n_observed)/cumsum(deaths_n_expected),
   by = list(location_code, year)]
dd
#>      year    yrwk deaths_n_observed deaths_n_expected location_code week
#>   1: 2019 2019-01               485               998      county_A    1
#>   2: 2019 2019-02               529               978      county_A    2
#>   3: 2019 2019-03               528              1047      county_A    3
#>   4: 2019 2019-04               509               984      county_A    4
#>   5: 2019 2019-05               465              1016      county_A    5
#>  ---                                                                    
#> 206: 2020 2020-49              2027               980      county_B   49
#> 207: 2020 2020-50              2024              1007      county_B   50
#> 208: 2020 2020-51              2040               986      county_B   51
#> 209: 2020 2020-52              2119               972      county_B   52
#> 210: 2020 2020-53              1991              1037      county_B   53
#>      cum_excess cum_excess_ratio
#>   1:       -513        0.4859719
#>   2:       -962        0.5131579
#>   3:      -1481        0.5100893
#>   4:      -1956        0.5118543
#>   5:      -2507        0.5008959
#>  ---                            
#> 206:      48914        1.9966990
#> 207:      49931        1.9969650
#> 208:      50985        1.9983552
#> 209:      52132        2.0017486
#> 210:      53086        2.0001507

Plot on original scale

We plot the ratio on the original scale. Need to format the y-axis using format_nor_num_2 for 2 digits.

max_val <- max(abs(dd$cum_excess_ratio))
q <- ggplot(dd, aes(x=week, y=cum_excess_ratio, color = year))
q <- q + geom_line(lwd = 1.3)
q <- q + geom_hline(yintercept = 1, lty = 2)
q <- q + facet_wrap(~ location_code, nrow = 1, scale = 'free')

q <- q + fhiplot::theme_fhi_basic(10)
q <- q + fhiplot::scale_color_fhi("", palette = "primary", direction = -1)
q <- q + guides(color = guide_legend(reverse = FALSE))
q <- q + scale_y_continuous(breaks = fhiplot::pretty_breaks(8),
                            labels = format_nor_num_2,
                            lim = c(0, max_val*1.2))
q <- q + labs(title = "Ratio of cumulative observed versus expected mortality (simulated data)", 
              x = "Week",
              y = "Ratio of cumulative mortality")
q

Plot on log2 scale

Now we can plot on the log-2 scale by taking y = log2(cum_excess_ratio), while still keeping the y-axis to the original scale. In this way the visual distance between the ratio and 1 are the same even though the actual distances are \(0.5 (|0.5-1|)\) and \(1 (|2-1|)\) respectively.

max_val <- max(abs(log2(dd$cum_excess_ratio)))
q <- ggplot(dd, aes(x=week, y=log2(cum_excess_ratio), color = year))
q <- q + geom_line(lwd = 1.3)
q <- q + geom_hline(yintercept = 0, lty = 2)
q <- q + facet_wrap(~ location_code, scale = 'free')

q <- q + fhiplot::theme_fhi_basic(10)
q <- q + fhiplot::scale_color_fhi("", palette = "primary", direction = -1)
q <- q + guides(color = guide_legend(reverse = FALSE))
q <- q + scale_y_continuous(breaks = fhiplot::pretty_breaks(9),
                            labels = format_nor_invlog2_2,
                            lim = c(-max_val, max_val))
q <- q + labs(title = "Ratio of cumulative observed versus expected mortality (simulated data)", 
              x = "Week",
              y = "Ratio of cumulative mortality")

q