ggnewscale
. Thanks @eliocamp for the PR.Removes defunct tests after ggplot2
update.
plot_confusion_matrix()
now shows total count when add_normalized=FALSE
. Thanks @JianWang2016 for reporting the issue.
Makes it more clear in the documentation that the Balanced Accuracy
metric in multiclass classification is the macro-averaged metric, not the average recall metric that is sometimes used.
Breaking: Adds slight 3D tile effect to help separate tiles with the same count. Not tested in all many-class scenarios.
Fixes image sizing (arrows and zero-shading) when there are different numbers of unique classes in targets and predictions.
Fixes bug with class_order
argument when there are different
numbers of unique classes in targets and predictions.
NEW: We created a Plot Confusion Matrix web application!
It allows using plot_confusion_matrix()
without code. Select from multiple
design templates or make your own.
For (palette=, sums_settings(palette=))
arguments, tile color
palettes can now be a custom gradient. Simply supply a named list with hex
colors for "low" and "high" (e.g. list("low"="#B1F9E8", "high"="#239895")
).
Adds intensity_by
, intensity_lims
, and intensity_beyond_lims
arguments to sum_tile_settings()
to allow setting them separately
for sum tiles.
Adds intensity_lims
argument which allows setting a custom
range for the tile color intensities. Makes it easier to
compare plots for different prediction sets.
Adds intensity_beyond_lims
for specifying how to handle counts / percentages
outside the specified intensity_lims
. Default is to truncate the intensities.
Fixes bug where arrow size was not taking add_sums
into account.
Adds option to set intensity_by
to a log/arcsinh transformed version of the counts.
This adds the options "log counts"
, "log2 counts"
, "log10 counts"
, "arcsinh counts"
to the intensity_by
argument.
Fixes bug when add_sums = TRUE
and counts_on_top = TRUE
.
Raises error for negative counts.
Fixes zero-division when all counts are 0.
Sets palette colors to lowest value when all counts are 0.
In plot_confusion_matrix()
, adds sub_col
argument for passing in text to replace
the bottom text (counts
by default).
In plot_confusion_matrix()
, fixes direction of arrows when class_order
is specified.
In update_hyperparameters()
, allows hyperparameters
argument to be NULL
. Thanks @ggrothendieck for reporting the issue.
In relevant contexts: Informs user once about the positive
argument in evaluate()
and cross_validate*()
not affecting the interpretation of probabilities. I, myself, had forgotten about this in a project, so seems useful to remind us all about :-)
Fixes usage of the "all"
name in set_metrics()
after purrr v1.0.0
update.
Makes testing conditional on the availability of xpectr
.
Fixes tidyselect
-related warnings.
parameters 0.19.0
. Thanks to @strengejacke.Fixes tests for CRAN.
Adds merDeriv
as suggested package.
parameters 0.15.0
. Thanks to @strengejacke.checkmate 2.1.0
.ggplot2
functions. Now compatible with ggplot2 3.3.4
.In order to reduce dependencies, model coefficients are now tidied with the parameters
package instead of broom
and broom.mixed
. Thanks to @IndrajeetPatil for the contributions.
In cross_validate()
and cross_validate_fn()
, fold columns can now have a varying number of folds in repeated cross-validation. Struggling to choose a number of folds? Average over multiple settings.
In the Class Level Results
in multinomial evaluations, the nested Confusion Matrix
and Results
tibbles are now named with their class to ease extraction and further work with these tibbles. The Results
tibble further gets a Class
column. This information might be redundant, but could make life easier.
Adds vignette: Multiple-k: Picking the number of folds for cross-validation
.
plot_confusion_matrix()
, where tiles with a count > 0 but a rounded percentage of 0 did not have the percentage text. Only tiles with a count of 0 should now be without text.Breaking change: In plot_confusion_matrix()
, the targets_col
and predictions_col
arguments have been renamed to target_col
and prediction_col
to be consistent with evaluate()
.
Breaking change: In evaluate_residuals()
, the targets_col
and predictions_col
arguments have been renamed to target_col
and prediction_col
to be consistent with evaluate()
.
Breaking change: In process_info_gaussian/binomial/multinomial()
, the targets_col
argument have been renamed to target_col
to be consistent with evaluate()
.
In binomial
most_challenging()
, the probabilities are now properly of the second class alphabetically.
In plot_confusion_matrix()
, adds argument class_order
for manually setting the order of the classes
in the facets.
In plot_confusion_matrix()
, tiles with a count of 0
no longer has text in the tile by default.
This adds the rm_zero_percentages
(for column/row percentage) and rm_zero_text
(for counts and normalized) arguments.
In plot_confusion_matrix()
, adds optional sum tiles. Enabling this (add_sums = TRUE
) adds an extra column and
an extra row with the sums. The corner tile contains the total count. This adds the add_sums
and sums_settings
arguments. A sum_tile_settings()
function has been added to control the appearance of these tiles. Thanks to @MaraAlexeev for the idea.
In plot_confusion_matrix()
, adds option (intensity_by
) to set the color intensity of the tiles to the overall percentages (normalized
).
In plot_confusion_matrix()
, adds option to only have row and column percentages in the diagonal tiles. Thanks to @xgirouxb for the idea.
Adds Process
information to output with the settings used. Adds transparency. It has a custom print method, making it easy to read. Underneath it is a list, why all information is available using $
or similar. In most cases, the Family
information has been moved into the Process
object. Thanks to @daviddalpiaz for notifying me of the need for more transparency.
In outputs, the Family
information is (in most cases) moved into the new Process
object.
In binomial
evaluate()
and baseline()
, Accuracy
is now enabled by default. It is still disabled in cross_validate*()
functions to guide users away from using it as the main criterion for model selection (as it is well known to many but can be quite bad in cases with imbalanced datasets.)
Fixes: In binomial evaluation, the probabilities are now properly of the second class alphabetically. When the target column was a factor where the levels were not in alphabetical order, the second level in that order was used. The levels are now sorted before extraction. Thanks to @daviddalpiaz for finding the bug.
Fixes: In grouped multinomial evaluation, when predictions are classes and there are different sets of classes per group, only the classes in the subset are used.
Fixes: Bug in ROC
direction parameter being set wrong when positive
is numeric. In regression tests, the AUC
scores were not impacted.
Fixes: 2-class multinomial
evaluation returns all expected metrics.
In multinomial evaluation, the Class Level Results
are sorted by the Class
.
Imports broom.mixed
to allow tidying of coefficients from lme4::lmer
models.
Exports process_info_binomial()
, process_info_multinomial()
, process_info_gaussian()
constructors to ensure the various methods are available. They are not necessarily intended for external use.
dplyr
version 1.0.0
. NOTE: this version of dplyr
slows down some functions in cvms
significantly, why it might be beneficial not to update before version 1.1.0
, which is supposed to tackle this problem.rsvg
and ggimage
are now only suggested and plot_confusion_matrix()
throws warning if either are not installed.
Additional input checks for evaluate()
.
In cross_validate()
and validate()
, the models
argument is renamed to formulas
. This is a more meaningful name that was recently introduced in cross_validate_fn()
. For now, the models
argument is deprecated, will be used instead of formulas
if specified, and will throw a warning.
In cross_validate()
and validate()
, the model_verbose
argument is renamed to verbose
. This is a more meaningful name that was recently introduced in cross_validate_fn()
. For now, the model_verbose
argument is deprecated, will be used instead of verbose
if specified, and will throw a warning.
In cross_validate()
and validate()
, the link
argument is removed. Consider using cross_validate_fn()
or validate_fn()
instead, where you have full control over the prediction type fed to the evaluation.
In cross_validate_fn()
, the predict_type
argument is removed. You now have to pass a predict function as that is safer and more transparent.
In functions with family
/type
argument, this argument no longer has a default, forcing the user to specify the family/type of the task. This also means that arguments have been reordered. In general, it is safer to name arguments when passing values to them.
In evaluate()
, apply_softmax
now defaults to FALSE
.
Throws error if probabilities do not add up to 1 row-wise (tolerance of 5 decimals) when type
is multinomial
.
multinomial
MCC
is now the proper multiclass generalization. Previous versions used macro MCC
. Removes MCC
from the class level results. Removes the option to enable Weighted MCC
.
multinomial
AUC
is calculated with pROC::multiclass.roc()
instead of in the one-vs-all evaluations. This removes AUC
, Lower CI
, and Upper CI
from the Class Level Results
and removes Lower CI
and Upper CI
from the main output tibble. Also removes option to enable "Weighted AUC", "Weighted Lower CI", and "Weighted Upper CI".
multinomial
AUC
is disabled by default, as it can take a long time to calculate for a large set of classes.
ROC
columns now return the ROC
objects instead of the extracted sensitivities
and specificities
, both of which can be extracted from the objects.
In evaluate()
, it's no longer possible to pass model objects. It now only evaluates the predictions. This removes the the AIC
, AICc
, BIC
, r2m
, and r2c
metrics.
In cross_validate
and validate()
, the r2m
, and r2c
metrics are now disabled by default in gaussian
. The r-squared metrics are non-predictive and should not be used for model selection. They can be enabled with metrics = list("r2m" = TRUE, "r2c" = TRUE)
.
In cross_validate_fn()
, the AIC
, AICc
, BIC
, r2m
, and r2c
metrics are now disabled by default in gaussian
. Only some model types will allow the computation of those metrics, and it is preferable that the user actively makes a choice to include them.
In baseline()
, the AIC
, AICc
, BIC
, r2m
, and r2c
metrics are now disabled by default in gaussian
.
It can be unclear whether the IC metrics (computed on the lm()
/lmer()
model objects) can be compared to those calculated for a given other model function. To avoid such confusion, it is preferable that the user actively makes a choice to include the metrics. The r-squared metrics will only be non-zero when random effects are passed. Given that we shouldn't use the r-squared metrics for model selection, it makes sense to not have them enabled by default.
validate()
now returns a tibble with the model objects nested in the Model
column. Previously, it returned a list with the results and models. This allows for easier use in magrittr
pipelines (%>%
).
In multinomial baseline()
, the aggregation approach is changed. The summarized results now properly describe the random evaluations tibble, except for the four new measures CL_Max
, CL_Min
, CL_NAs
, and CL_INFs
, which describe the class level results. Previously, NAs
were removed before aggregating the one-vs-all evaluations, meaning that some metric summaries could become inflated if small classes had NA
s. It was also non-transparent that the NA
s and INF
s were counted in the class level results instead of being a count of random evaluations with NA
s or INF
s.
cv_plot()
is removed. It wasn't very useful and has never been developed properly. We aim to provide specialized plotting functions instead.
validate_fn()
is added. Validate your custom model function on a test set.
confusion_matrix()
is added. Create a confusion matrix and calculate associated metrics from your targets and predictions.
evaluate_residuals()
is added. Calculate common metrics from regression residuals.
summarize_metrics()
is added. Use it summarize the numeric columns in your dataset with a set of common descriptors. Counts the NA
s and Inf
s. Used by baseline()
.
select_definitions()
is added. Select the columns that define the models, such as Dependent
, Fixed
, Random
, and the (unnested) hyperparameters.
model_functions()
is added. Contains simple model_fn
examples that can be used in cross_validate_fn()
and validate_fn()
or as starting points.
predict_functions()
is added. Contains simple predict_fn
examples that can be used in cross_validate_fn()
and validate_fn()
or as starting points.
preprocess_functions()
is added. Contains simple preprocess_fn
examples that can be used in cross_validate_fn()
and validate_fn()
or as starting points.
update_hyperparameters()
is added. For managing hyperparameters when writing custom model functions.
most_challenging()
is added. Finds the data points that were the most difficult to predict.
plot_confusion_matrix()
is added. Creates a ggplot
representing a given confusion matrix. Thanks to Malte Lau Petersen (@maltelau), Maris Sala (@marissala) and Kenneth Enevoldsen (@KennethEnevoldsen) for feedback.
plot_metric_density()
is added. Creates a ggplot density plot for a metric column.
font()
is added. Utility for setting font settings (size, color, etc.) in plotting functions.
simplify_formula()
is added. Converts a formula with inline functions to a simple formula where all variables are added together (e.g. y ~ x*z + log(a) + (1|b)
-> y ~ x + z + a + b
). This is useful when passing a formula to recipes::recipe()
, which doesn't allow the inline functions.
gaussian_metrics()
, binomial_metrics()
, and multinomial_metrics()
are added. Can be used to select metrics for the metrics
argument in many cvms
functions.
baseline_gaussian()
, baseline_binomial()
, baseline_multinomial()
are added. Simple wrappers for baseline()
that are easier to use and have simpler help files. baseline()
has a lot of arguments that are specific to a family, which can be a bit confusing.
wines
dataset is added. Contains a list of wine varieties in an approximately Zipfian distribution.
musicians
dataset is added. This has been generated for multiclass classification examples.
predicted.musicians
dataset is added. This contains cross-validated predictions of the musicians
dataset by three algorithms. Can be used to demonstrate working with predictions from repeated 5-fold stratified cross-validation.
Adds NRMSE(RNG)
, NRMSE(IQR)
, NRMSE(STD)
, NRMSE(AVG)
metrics to gaussian
evaluations. The RMSE
is normalized by either target range (RNG), target interquartile range (IQR), target standard deviation (STD), or target mean (AVG). Only NRMSE(IQR)
is enabled by default.
Adds RMSLE
, RAE
, RSE
, RRSE
, MALE
, MAPE
, MSE
, TAE
and TSE
metrics to gaussian
evaluations. RMSLE
, RAE
, and RRSE
are enabled by default.
Adds Information Criterion metrics (AIC
, AICc
, BIC
) to the binomial
and multinomial
output of some functions (disabled by default). These are based on the fitted model objects and will only work for some types of models.
Adds Positive Class
column to binomial
evaluations.
Adds optional hyperparameter
argument to cross_validate_fn()
.
Pass a list of hyperparameters and every combination of these will be cross-validated.
Adds optional preprocess_fn
argument to cross_validate_fn()
. This can, for instance, be used to standardize the training and test sets within the function. E.g., by extracting the scaling and centering parameters from the training set and apply them to both the training set and the test fold.
Adds Preprocess
column to output when preprocess_fn
is passed. Contains returned parameters (e.g. mean, sd) used in preprocessing.
Adds preprocess_once
argument to cross_validate_fn()
. When preprocessing does not depend on the current formula or hyperparameters, we might as well perform it on each train/test split once, instead of for every model.
Adds metrics
argument to baseline()
. Enable the non-default metrics you want a baseline evaluation for.
Adds preprocessing
argument to cross_validate()
and validate()
. Currently allows "standardize", "scale", "center", and "range". Results will likely not be affected noticeably by the preprocessing.
Adds add_targets
and add_predicted_classes
arguments to multiclass_probability_tibble()
.
Adds Observation
column in the nested predictions tibble in cross_validate()
, cross_validate_fn()
, validate()
, and validate_fn()
. These indices can be used to identify which observations are difficult to predict.
Adds SD
column in the nested predictions tibble in evaluate()
when performing ID aggregated evaluation with id_method = 'mean'
. This is the standard deviation of the predictions for the ID.
Adds vignette: Cross-validating custom model functions with cvms
Adds vignette: Creating a confusion matrix with cvms
Adds vignette: The available metrics in cvms
Adds vignette: Evaluate by ID/group
The metrics
argument now allows setting a boolean for "all"
inside the list to enable or disable all the metrics. For instance, the following would disable all the metrics except RMSE
: metrics = list("all" = FALSE, "RMSE" = TRUE)
.
multinomial
evaluation results now contain the Results
tibble with the results for each fold column. The main metrics are now averages of these fold column results. Previously, they were not aggregated by fold column first. In the unit tests, this has not altered the results, but it is a more correct approach.
The prediction column(s) in evaluate()
must be either numeric or character, depending on the format chosen.
In binomial
evaluate()
, it's now possible to pass predicted classes instead of probabilities. Probabilities
still carry more information though. Both the prediction and target columns must have type character in this format.
Changes the required arguments in the predict_fn
function passed to cross_validate_fn()
.
Changes the required arguments in the model_fn
function passed to cross_validate_fn()
.
Warnings and messages from preprocess_fn
are caught and added to Warnings and Messages
. Warnings are counted in Other Warnings
.
Nesting is now done with dplyr::group_nest
instead of tidyr::nest_legacy
for speed improvements.
caret
, mltools
, and ModelMetrics
are no longer dependencies. The confusion matrix metrics have instead been implemented in cvms
(see confusion_matrix()
).
select_metrics()
now works with a wider range of inputs as it no longer depends on a Family
column.
The Fixed
column in some of the output tibbles have been moved to make it clearer which model was evaluated.
Better handling of inline functions in formulas.
evaluate()
, when used on a grouped data frame. The row order in the output was not guaranteed to fit the grouping keys.Fixes documentation in cross_validate_fn()
. The examples section contained an unreasonable number of mistakes :-)
In cross_validate_fn()
, warnings and messages from the predict function are now included in Warnings and Messages
. The warnings are counted in Other Warnings
.
Breaking change: In evaluate()
, when type
is multinomial
, the output is now a single tibble. The Class Level Results
are included as a nested tibble.
Breaking change: In baseline()
, lmer
models are now fitted with REML = FALSE
by default.
Adds REML
argument to baseline()
.
cross_validate_fn()
is added. Cross-validate custom model functions.
Bug fix: the control
argument in cross_validate()
was not being used. Now it is.
In cross_validate()
, the model is no longer fitted twice when a warning is thrown during fitting.
Adds metrics
argument to cross_validate()
and validate()
. Allows enabling the regular Accuracy
metric
in binomial
or to disable metrics (will currently still be computed but not included in the output).
AICc
is now computed with the MuMIn
package instead of the AICcmodavg
package, which
is no longer a dependency.
Adds lifecycle
badges to the function documentation.
evaluate()
is added. Evaluate your model's predictions with the same metrics as used in cross_validate()
.
Adds 'multinomial'
family/type to baseline()
and evaluate()
.
Adds multiclass_probability_tibble()
for generating a random probability tibble.
Adds random_effects
argument to baseline()
for adding random effects to the Gaussian baseline model.
Adds Zenodo DOI for easier citation.
In nested confusion matrices, the Reference column is renamed to Target, to use the same naming scheme as in the nested predictions.
Bug fix: p-values are correctly added to the nested coefficients tibble. Adds tests of this table as well.
Adds extra unit tests to increase code coverage.
When argument "model_verbose"
is TRUE
, the used model function is now messaged instead of printed.
Adds badges to README, including travis-ci status, AppVeyor status, Codecov, min. required R version, CRAN version and monthly CRAN downloads. Note: Zenodo badge will be added post release.
R v. 3.5
Adds optional parallelization.
Results now contain a count of singular fit messages. See ?lme4::isSingular
for more information.
Argument "positive"
changes default value to 2. Now takes either 1 or 2 (previously 0 and 1). If your dependent variable has values 0 and 1, 1 is now the positive class by default.
AUC calculation has changed. Now explicitly sets the direction in pROC::roc
.
Unit tests have been updated for the new random sampling generator in R 3.6.0
. They will NOT run previous versions of R.
Adds baseline()
for creating baseline evaluations.
Adds reconstruct_formulas()
for reconstructing formulas based on model definition columns in the results tibble.
Adds combine_predictors()
for generating model formulas from a set of fixed effects.
Adds select_metrics()
for quickly selecting the metrics and model definition columns.
Breaking change: Metrics have been rearranged and a few metrics have been added.
Breaking change: Renamed argument folds_col
to fold_cols
to better fit the new repeated cross-validation option.
New: repeated cross-validation.
Created package :)