Introduction to simK • simK

Main functions

This package has 3 main functions, with them we can generate simulated data for a pool of donors, a set of kidney transplant candidates and the respective HLA-antibodies for those patients HLA sensitized.

library(simK)

Donors

A data frame with information for a pool of simulated donors can be generated with the function donors_df():

donors_df(n = 10, 
          replace = TRUE,
          origin = 'PT',
          probs = c(0.4658, 0.0343, 0.077, 0.4229),
          lower=18, upper=75,
          mean = 60, sd = 12,
          uk = FALSE,
          seed.number = 3)
#> # A tibble: 10 × 9
#>    ID    bg    A1    A2    B1    B2    DR1   DR2     age
#>    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#>  1 D1    A     1     29    8     44    11    7        53
#>  2 D2    O     2     3     7     57    4     8        57
#>  3 D3    A     11    33    14    35    1     13       61
#>  4 D4    O     24    30    49    58    11    11       62
#>  5 D5    B     2     3     7     51    1     13       66
#>  6 D6    A     30    68    15    18    3     4        45
#>  7 D7    O     3     26    18    40    11    13       52
#>  8 D8    B     1     1     7     8     3     13       55
#>  9 D9    O     3     3     7     44    15    8        75
#> 10 D10   A     11    29    44    57    7     7        64

For a given number of rows n, a data frame is generated with columns:

ID unique identifier with the prefix ‘D’;
bg with the blood group generated from the parameter probs a vector with the probabilities for groups A, AB, B and O, respectively;
A1, A2, B1, B2, DR1, DR2 HLA typing obtained according to origin option (with replace = TRUE we can generate a data frame without limitations on the number of rows);
age generated from a Normal distribution with mean and sd given by the user, values truncated by lower and upper boundaries;
DRI when option uk = TRUE, Donor Risk Index is copmputed as described by transplantr

HLA population origin has currently as valid options ‘PT’ for Portuguese, and populations available from US National Marrow Donor Program:

‘API’ - Asian / Pacific Islander
‘AFA’ - African American / Black
‘CAU’ - White / Caucasian
‘HIS’ - Hispanic

Defining seed.number allows for reproducibility.

:information_source: to compute DRI as decribed on transplantr, we generated variables: height (\(N(165,20)\)); hypertension (with probability \(0.43\)); sex (with probability \(0.55\) for man); CMV+ (with probability \(0.9\)); hospital stay (\(P(\lambda = 4)\)); and GFR by age (<30 \(N(116,10)\); 30-39 \(N(107,10)\); 40-49 \(N(99,10)\); 50-59 \(N(93,10)\); 60-69 \(N(85, 10)\); >=70 \(N(75, 10)\))

Candidates

A simulated waiting list for kidney transplant candidates, can be generated with candidates_df():

candidates_df(n = 10, 
              replace = TRUE,
              origin = 'PT',
              probs.abo = c(0.43, 0.03, 0.08, 0.46),
              probs.cpra = c(0.7, 0.1, 0.1, 0.1),
              lower=18, upper=75,
              mean = 45, sd = 15,
              prob.dm = 0.12,
              prob.urgent = 0.05,
              uk = FALSE,
              seed.number = 3)
#> # A tibble: 10 × 13
#>    ID    bg    A1    A2    B1    B2    DR1   DR2     age  cPRA hiper dialysis
#>    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <lgl>    <dbl>
#>  1 K1    O     1     29    8     44    11    7        37     0 FALSE       80
#>  2 K2    A     2     3     7     57    4     8        42     0 FALSE       49
#>  3 K3    O     11    33    14    35    1     13       68     0 FALSE       49
#>  4 K4    O     24    30    49    58    11    11       46     0 FALSE       36
#>  5 K5    A     2     3     7     51    1     13       47    76 FALSE       38
#>  6 K6    A     30    68    15    18    3     4        71    25 FALSE       44
#>  7 K7    O     3     26    18    40    11    13       52    91 TRUE        99
#>  8 K8    O     1     1     7     8     3     13       26     0 FALSE       69
#>  9 K9    A     3     3     7     44    15    8        35     0 FALSE       27
#> 10 K10   A     11    29    44    57    7     7        38     0 FALSE       10
#> # ℹ 1 more variable: urgent <dbl>

For a given number of n rows, a data frame is generated with columns:

ID unique identifier with the prefix ‘K’;
bg with the blood group generated from the parameter probs.abo a vector with the probabilities for groups A, AB, B and O, respectively (here by default, we assumed group O patients are more frequent);
A1, A2, B1, B2, DR1, DR2 HLA typing obtained according to origin option (with replace = TRUE we can generate a data frame without limitations on the number of rows);
age generated from a Normal distribution with mean and sd given by the user, values truncated by lower and upper boundaries;
dialysis time on dialysis in months, values computed according to patients’ blood group and hypersensitation status (cPRA > 85%): for patients with blood group O and Hypersinsitized time on dialysis obtained from N(85, 20); for those patients blood O or Hypersinsitized \(N(70,20)\); remaining patients have time on dialysis obtained from \(N(35,20)\);
cPRA patients are classified in groups with probabilities given by probs.cpra for 0%, 1%-50%, 51%-85% and 86%-100%, respectively. Within the groups > 0%, cPRA are computed as random values from distributions \(P(\lambda = 30)\), \(P(\lambda = 70)\) and \(P(\lambda = 90)\);
Tier patients are classified in two Tiers as described on POL186/11 – Kidney Transplantation: Deceased Donor Organ Allocation from UK transplant. In Tier A are patients with MS = 10 or cPRA = 100% or time on dialysis > 7 years, all remaing patients are classified as Tier B;
MS matchabilily score are the deciles obtained from the number of donors on dataset D10K that are a match to each transplant candidate. This score takes into account a patient’s blood type, HLA type and cPRA value. A patient with a MS = 1 is defined as easy to match and a MS = 10 as difficult to match.
RRI when option uk = TRUE, Recipient Risk Index is copmputed as described by transplantr. To compute RRI, variables age, time on dialysis (in days) and the probability of being diabetic (obtained from prob.dm) are used. Also, we assumed all patients were on dialysis at time of listing.
urgent a diccotomic variavel that assumes 1 for clinical urgent patients. It’s generated from prob.urgent.

HLA population origin can be defined from options: ‘PT’,‘API’,‘AFA’,‘CAU’ and ‘HIS’, as reported for donors_df() data frame.

Defining seed.number allows for reproducibility.

HLA antibodies

the function Abs_df() allows to generate a data frame with HLA antibodies from a candidates waiting list:

Abs_df(candidates = candidates_df(n=10),
       origin = 'PT',
       seed.number = 3)
#> # A tibble: 35 × 2
#>    ID    abs  
#>    <chr> <chr>
#>  1 K5    A25  
#>  2 K5    DR4  
#>  3 K5    A34  
#>  4 K5    B53  
#>  5 K5    B49  
#>  6 K5    DR4  
#>  7 K5    B54  
#>  8 K5    B57  
#>  9 K5    B44  
#> 10 K5    A30  
#> # ℹ 25 more rows

as inputs, this function requires a data set with an ID and patients HLA information (HLA typing and cPRA value) with the same format as provided by candidates_df(). Defining seed.number allows for reproducibility.

HLA population origin must be defined in accordance with functions candidates_df().

For PT origin, all these functions rely on HLA typing at intermediate resolution as described at Lima et al, 2013.

For NMDP populations, HLA typing were described by Gragert et al, 2013