R/evolve_model_ntimes.R
evolve_model_ntimes.Rd
evolve_model
uses a genetic algorithm to estimate a finite-state
machine model, primarily for understanding and predicting decision-making.
evolve_model_ntimes(data, test_data = NULL, drop_nzv = FALSE, measure = c("accuracy", "sens", "spec", "ppv"), states = NULL, cv = FALSE, max_states = NULL, k = 2, actions = NULL, seed = NULL, popSize = 75, pcrossover = 0.8, pmutation = 0.1, maxiter = 50, run = 25, parallel = FALSE, priors = NULL, verbose = TRUE, return_best = TRUE, ntimes = 10, cores = NULL)
data | A |
---|---|
test_data | Optional |
drop_nzv | Optional logical vector length one specifying whether
predictors variables with variance in provided data near zero should be
dropped before model building. Default is |
measure | Optional length one character vector that is either:
"accuracy", "sens", "spec", or "ppv". This specifies what measure of
predictive performance to use for training and evaluating the model. The
default measure is |
states | Optional numeric vector with the number of states.
If not provided, will be set to |
cv | Optional logical vector length one for whether cross-validation
should be conducted on training data to select optimal number of states.
This can drastically increase computation time because if |
max_states | Optional numeric vector length one only relevant if
|
k | Optional numeric vector length one only relevant if cv==TRUE, specifying number of folds for cross-validation. |
actions | Optional numeric vector with the number of actions. If not provided, then actions will be set as the number of unique values in the outcome vector. |
seed | Optional numeric vector length one. |
popSize | Optional numeric vector length one specifying the size of the GA population. A larger number will increase the probability of finding a very good solution but will also increase the computation time. This is passed to the GA::ga() function of the GA package. |
pcrossover | Optional numeric vector length one specifying probability of crossover for GA. This is passed to the GA::ga() function of the GA package. |
pmutation | Optional numeric vector length one specifying probability of mutation for GA. This is passed to the GA::ga() function of the GA package. |
maxiter | Optional numeric vector length one specifying max number of
iterations for stopping the GA evolution. A larger number will increase the
probability of finding a very good solution but will also increase the
computation time. This is passed to the GA::ga() function of the GA
package. |
run | Optional numeric vector length one specifying max number of consecutive iterations without improvement in best fitness score for stopping the GA evolution. A larger number will increase the probability of finding a very good solution but will also increase the computation time. This is passed to the GA::ga() function of the GA package. |
parallel | Optional logical vector length one. For running the GA evolution in parallel. Depending on the number of cores registered and the memory on your machine, this can make the process much faster, but only works for Unix-based machines that can fork the processes. |
priors | Optional numeric matrix of solutions strings to be included in the initialization. User needs to use a decoder function to translate prior decision models into bits and then provide them. If this is not specified, then random priors are automatically created. |
verbose | Optional logical vector length one specifying whether helpful messages should be displayed on the user's console or not. |
return_best | Optional logical vector length one specifying whether to return just the best model or all models. Only relevant if ntimes > 1. Default is TRUE. |
ntimes | Optional integer vector length one specifying the number of times to estimate model. Default is 1 time. |
cores | integer vector length one specifying number of cores to use if parallel is TRUE. |
Returns a list where each element is an S4 object of class ga_fsm. See
ga_fsm for the details of the slots (objects) that this type
of object will have and for information on the methods that can be used to
summarize the calling and execution of evolve_model()
, including
summary
, print
, and plot
.
This function of the datafsm package applies the evolve_model
function multiple times and then returns a list with either all the models or
the best one.
evolve_model
uses a stochastic meta-heuristic optimization routine to
estimate the parameters that define a FSM model. Because this is not
guaranteed to return the best result, we run it many times.
if (FALSE) { # Create data: cdata <- data.frame(period = rep(1:10, 1000), outcome = rep(1:2, 5000), my.decision1 = sample(1:0, 10000, TRUE), other.decision1 = sample(1:0, 10000, TRUE)) (res <- evolve_model_ntimes(cdata, ntimes=2)) (res <- evolve_model_ntimes(cdata, return_best = FALSE, ntimes=2)) }