Title: | Rearranging Data |
---|---|
Description: | Arrange data by a set of methods. Use rearrangers to reorder data points and mutators to change their values. From basic utilities, to centering the greatest value, to swirling in 3-dimensional space, 'rearrr' enables creativity when plotting and experimenting with data. |
Authors: | Ludvig Renbo Olsen [aut, cre] (<https://orcid.org/0009-0006-6798-7454>, @ludvigolsen) |
Maintainer: | Ludvig Renbo Olsen <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.4.9000 |
Built: | 2025-02-05 05:11:50 UTC |
Source: | https://github.com/ludvigolsen/rearrr |
Calculates the angle between each data point and the origin
with:
And converts to degrees [0-360)
, measured counterclockwise from the {x > x1, y = y1}
line.
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and finding the angle to e.g. the centroid
of each group.
angle( data, x_col = NULL, y_col = NULL, origin = NULL, origin_fn = NULL, degrees_col_name = ".degrees", origin_col_name = ".origin", overwrite = FALSE )
angle( data, x_col = NULL, y_col = NULL, origin = NULL, origin_fn = NULL, degrees_col_name = ".degrees", origin_col_name = ".origin", overwrite = FALSE )
data |
|
x_col |
Name of x column in |
y_col |
Name of y column in |
origin |
Coordinates of the origin to calculate angle to.
A scalar to use in all dimensions
or a N.B. Ignored when |
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
degrees_col_name |
Name of new column with the degrees. |
origin_col_name |
Name of new column with the origin coordinates. If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
data.frame
(tibble
) with the additional columns (degrees and origin coordinates).
Ludvig Renbo Olsen, [email protected]
Other measuring functions:
distance()
,
vector_length()
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Calculate angles in the two dimensions (x and y) # With the origin at x=0.5, y=0.5 df_angles <- angle( data = df, x_col = "x", y_col = "y", origin = c(0.5, 0.5) ) df_angles # Plot points with degrees # Degrees are measured counterclockwise around the # positive side of the x-axis if (has_ggplot){ df_angles %>% ggplot(aes(x = x, y = y, color = .degrees)) + geom_segment(aes(x = 0.5, xend = 1, y = 0.5, yend = 0.5), color = "magenta") + geom_point() + theme_minimal() } # Calculate angles to the centroid for each group in 'g' angle( data = dplyr::group_by(df, g), x_col = "x", y_col = "y", origin_fn = centroid )
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Calculate angles in the two dimensions (x and y) # With the origin at x=0.5, y=0.5 df_angles <- angle( data = df, x_col = "x", y_col = "y", origin = c(0.5, 0.5) ) df_angles # Plot points with degrees # Degrees are measured counterclockwise around the # positive side of the x-axis if (has_ggplot){ df_angles %>% ggplot(aes(x = x, y = y, color = .degrees)) + geom_segment(aes(x = 0.5, xend = 1, y = 0.5, yend = 0.5), color = "magenta") + geom_point() + theme_minimal() } # Calculate angles to the centroid for each group in 'g' angle( data = dplyr::group_by(df, g), x_col = "x", y_col = "y", origin_fn = centroid )
Perform matrix multiplication with a transformation matrix and a set of data.frame
columns.
The data points in `data`
are moved prior to the transformation, to bring
the origin to 0
in all dimensions. After the transformation, the
inverse move is applied to bring the origin back to its original position. See `Details`
section.
The columns in `data`
are transposed, making the operation (without the origin movement):
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and transforming around e.g. the centroid
of each group.
apply_transformation_matrix( data, mat, cols, origin = NULL, origin_fn = NULL, suffix = "_transformed", keep_original = TRUE, origin_col_name = ".origin", overwrite = FALSE )
apply_transformation_matrix( data, mat, cols, origin = NULL, origin_fn = NULL, suffix = "_transformed", keep_original = TRUE, origin_col_name = ".origin", overwrite = FALSE )
data |
|
mat |
Transformation |
cols |
Columns to mutate values of. Must be specified when |
origin |
Coordinates of the origin. |
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
origin_col_name |
Name of new column with the origin coordinates. If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
Example with 2 columns (x
, y
) and a 2x2 transformation matrix:
Move origin to (0, 0)
:
x = x - origin_x
y = y - origin_y
Convert to transposed matrix:
data_mat = rbind(x, y)
Matrix multiplication:
transformed = mat %*% data_mat
Move origin to original position (after extraction from transformed
):
x = x + origin_x
y = y + origin_y
data.frame
(tibble
) with the new, transformed columns and the origin coordinates.
Ludvig Renbo Olsen, [email protected]
Other mutate functions:
cluster_groups()
,
dim_values()
,
expand_distances()
,
expand_distances_each()
,
flip_values()
,
roll_values()
,
rotate_2d()
,
rotate_3d()
,
shear_2d()
,
shear_3d()
,
swirl_2d()
,
swirl_3d()
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(3) # Create a data frame df <- data.frame( "x" = 1:12, "y" = 13:24, "z" = runif(12), "g" = c( 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3 ) ) # Apply identity matrix mat <- matrix(c(1, 0, 0, 0, 1, 0, 0, 0, 1), nrow = 3) apply_transformation_matrix( data = df, mat = mat, cols = c("x", "y", "z"), origin = c(0, 0, 0) ) # Apply rotation matrix # 90 degrees around z-axis # Origin is the most centered point mat <- matrix(c(0, 1, 0, -1, 0, 0, 0, 0, 1), nrow = 3) res <- apply_transformation_matrix( data = df, mat = mat, cols = c("x", "y", "z"), origin_fn = most_centered ) # Plot the rotation # z wasn't changed so we plot x and y if (has_ggplot){ res %>% ggplot(aes(x = x, y = y)) + geom_point() + geom_point(aes(x = x_transformed, y = y_transformed)) + theme_minimal() } # Apply rotation matrix to grouped data frame # Around centroids # Same matrix as before res <- apply_transformation_matrix( data = dplyr::group_by(df, g), mat = mat, cols = c("x", "y", "z"), origin_fn = centroid ) # Plot the rotation if (has_ggplot){ res %>% ggplot(aes(x = x, y = y, color = g)) + geom_point() + geom_point(aes(x = x_transformed, y = y_transformed)) + theme_minimal() }
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(3) # Create a data frame df <- data.frame( "x" = 1:12, "y" = 13:24, "z" = runif(12), "g" = c( 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3 ) ) # Apply identity matrix mat <- matrix(c(1, 0, 0, 0, 1, 0, 0, 0, 1), nrow = 3) apply_transformation_matrix( data = df, mat = mat, cols = c("x", "y", "z"), origin = c(0, 0, 0) ) # Apply rotation matrix # 90 degrees around z-axis # Origin is the most centered point mat <- matrix(c(0, 1, 0, -1, 0, 0, 0, 0, 1), nrow = 3) res <- apply_transformation_matrix( data = df, mat = mat, cols = c("x", "y", "z"), origin_fn = most_centered ) # Plot the rotation # z wasn't changed so we plot x and y if (has_ggplot){ res %>% ggplot(aes(x = x, y = y)) + geom_point() + geom_point(aes(x = x_transformed, y = y_transformed)) + theme_minimal() } # Apply rotation matrix to grouped data frame # Around centroids # Same matrix as before res <- apply_transformation_matrix( data = dplyr::group_by(df, g), mat = mat, cols = c("x", "y", "z"), origin_fn = centroid ) # Plot the rotation if (has_ggplot){ res %>% ggplot(aes(x = x, y = y, color = g)) + geom_point() + geom_point(aes(x = x_transformed, y = y_transformed)) + theme_minimal() }
The highest value is positioned in the middle with the other values decreasing around it.
Example:
The column values:
c(1, 2, 3, 4,
5
)
are ordered as:
c(1, 3,
5
, 4, 2)
center_max(data, col = NULL, shuffle_sides = FALSE)
center_max(data, col = NULL, shuffle_sides = FALSE)
data |
|
col |
Column to create sorting factor by.
When |
shuffle_sides |
Whether to shuffle which elements are left and right of the center. (Logical) |
The sorted data.frame
(tibble
) / vector
.
Ludvig Renbo Olsen, [email protected]
Other rearrange functions:
center_min()
,
closest_to()
,
furthest_from()
,
pair_extremes()
,
position_max()
,
position_min()
,
rev_windows()
,
roll_elements()
,
shuffle_hierarchy()
,
triplet_extremes()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "C" = LETTERS[1:10], "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Center by the index (row numbers) center_max(df) # Center by each of the columns center_max(df, col = "A") center_max(df, col = "B") center_max(df, col = "C") # Randomize which elements are left and right of the center center_max(df, col = "A", shuffle_sides = TRUE) # Grouped by G df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% center_max(col = "A") # Plot the centered values plot(x = 1:10, y = center_max(df, col = "B")$B) plot(x = 1:10, y = center_max(df, col = "B", shuffle_sides = TRUE)$B)
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "C" = LETTERS[1:10], "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Center by the index (row numbers) center_max(df) # Center by each of the columns center_max(df, col = "A") center_max(df, col = "B") center_max(df, col = "C") # Randomize which elements are left and right of the center center_max(df, col = "A", shuffle_sides = TRUE) # Grouped by G df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% center_max(col = "A") # Plot the centered values plot(x = 1:10, y = center_max(df, col = "B")$B) plot(x = 1:10, y = center_max(df, col = "B", shuffle_sides = TRUE)$B)
The lowest value is positioned in the middle with the other values increasing around it.
Example:
The column values:
c(
1
, 2, 3, 4, 5)
are ordered as:
c(5, 3,
1
, 2, 4)
center_min(data, col = NULL, shuffle_sides = FALSE)
center_min(data, col = NULL, shuffle_sides = FALSE)
data |
|
col |
Column to create sorting factor by.
When |
shuffle_sides |
Whether to shuffle which elements are left and right of the center. (Logical) |
The sorted data.frame
(tibble
) / vector
.
Ludvig Renbo Olsen, [email protected]
Other rearrange functions:
center_max()
,
closest_to()
,
furthest_from()
,
pair_extremes()
,
position_max()
,
position_min()
,
rev_windows()
,
roll_elements()
,
shuffle_hierarchy()
,
triplet_extremes()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "C" = LETTERS[1:10], "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Center by the index (row numbers) center_min(df) # Center by each of the columns center_min(df, col = "A") center_min(df, col = "B") center_min(df, col = "C") # Randomize which elements are left and right of the center center_min(df, col = "A", shuffle_sides = TRUE) # Grouped by G df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% center_min(col = "A") # Plot the centered values plot(x = 1:10, y = center_min(df, col = "B")$B) plot(x = 1:10, y = center_min(df, col = "B", shuffle_sides = TRUE)$B)
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "C" = LETTERS[1:10], "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Center by the index (row numbers) center_min(df) # Center by each of the columns center_min(df, col = "A") center_min(df, col = "B") center_min(df, col = "C") # Randomize which elements are left and right of the center center_min(df, col = "A", shuffle_sides = TRUE) # Grouped by G df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% center_min(col = "A") # Plot the centered values plot(x = 1:10, y = center_min(df, col = "B")$B) plot(x = 1:10, y = center_min(df, col = "B", shuffle_sides = TRUE)$B)
Calculates the mean of each passed vector
/column
.
centroid(..., cols = NULL, na.rm = FALSE)
centroid(..., cols = NULL, na.rm = FALSE)
... |
|
cols |
Names of columns to use when |
na.rm |
Whether to ignore missing values when calculating means. (Logical) |
Means of the supplied vectors
/columns
. Either as a vector
or a data.frame
.
Ludvig Renbo Olsen, [email protected]
Other coordinate functions:
create_origin_fn()
,
is_most_centered()
,
midrange()
,
most_centered()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(10) z <- runif(10) # Find centroid coordinates # Aka. the means of each vector centroid(x, y, z) # # For data.frames # # Create data frame df <- data.frame( "x" = x, "y" = y, "z" = z, "g" = rep(1:2, each = 5) ) # Find centroid coordinates # Aka. the means of each column centroid(df, cols = c("x", "y", "z")) # When 'df' is grouped df %>% dplyr::group_by(g) %>% centroid(cols = c("x", "y", "z"))
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(10) z <- runif(10) # Find centroid coordinates # Aka. the means of each vector centroid(x, y, z) # # For data.frames # # Create data frame df <- data.frame( "x" = x, "y" = y, "z" = z, "g" = rep(1:2, each = 5) ) # Find centroid coordinates # Aka. the means of each column centroid(df, cols = c("x", "y", "z")) # When 'df' is grouped df %>% dplyr::group_by(g) %>% centroid(cols = c("x", "y", "z"))
Create the x-coordinates for a vector
of y-coordinates such that
they form a circle.
This will likely look most like a circle when the y-coordinates are somewhat equally distributed, e.g. a uniform distribution.
circularize( data, y_col = NULL, .min = NULL, .max = NULL, offset_x = 0, keep_original = TRUE, x_col_name = ".circle_x", degrees_col_name = ".degrees", origin_col_name = ".origin", overwrite = FALSE )
circularize( data, y_col = NULL, .min = NULL, .max = NULL, offset_x = 0, keep_original = TRUE, x_col_name = ".circle_x", degrees_col_name = ".degrees", origin_col_name = ".origin", overwrite = FALSE )
data |
|
y_col |
Name of column in |
.min |
Minimum y-coordinate. If |
.max |
Maximum y-coordinate. If |
offset_x |
Value to offset the x-coordinates by. |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
x_col_name |
Name of new column with the x-coordinates. |
degrees_col_name |
Name of new column with the angles in degrees. If Angling is counterclockwise around |
origin_col_name |
Name of new column with the origin coordinates (center of circle). If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
data.frame
(tibble
) with the added x-coordinates and the angle in degrees.
Ludvig Renbo Olsen, [email protected]
Other forming functions:
hexagonalize()
,
square()
,
triangularize()
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "y" = runif(200), "g" = factor(rep(1:5, each = 40)) ) # Circularize 'y' df_circ <- circularize(df, y_col = "y") df_circ # Plot circle if (has_ggplot){ df_circ %>% ggplot(aes(x = .circle_x, y = y, color = .degrees)) + geom_point() + theme_minimal() } # # Grouped circularization # # Circularize 'y' for each group # First cluster the groups a bit to move the # circles away from each other df_circ <- df %>% cluster_groups( cols = "y", group_cols = "g", suffix = "", overwrite = TRUE ) %>% dplyr::group_by(g) %>% circularize( y_col = "y", overwrite = TRUE ) # Plot circles if (has_ggplot){ df_circ %>% ggplot(aes(x = .circle_x, y = y, color = g)) + geom_point() + theme_minimal() } # # Specifying minimum value # # Specify minimum value manually df_circ <- circularize(df, y_col = "y", .min = -2) df_circ # Plot circle if (has_ggplot){ df_circ %>% ggplot(aes(x = .circle_x, y = y, color = .degrees)) + geom_point() + theme_minimal() } # # Multiple circles by contraction # # Start by circularizing 'y' df_circ <- circularize(df, y_col = "y") # Contract '.circle_x' and 'y' towards the centroid # To contract with multiple multipliers at once, # we wrap the call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = 1:10 / 10, .f = function(mult) { expand_distances( data = df_circ, cols = c(".circle_x", "y"), multiplier = mult, origin_fn = centroid, overwrite = TRUE ) } ) df_expanded if (has_ggplot){ df_expanded %>% ggplot(aes( x = .circle_x_expanded, y = y_expanded, color = .degrees, alpha = .multiplier )) + geom_point() + theme_minimal() }
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "y" = runif(200), "g" = factor(rep(1:5, each = 40)) ) # Circularize 'y' df_circ <- circularize(df, y_col = "y") df_circ # Plot circle if (has_ggplot){ df_circ %>% ggplot(aes(x = .circle_x, y = y, color = .degrees)) + geom_point() + theme_minimal() } # # Grouped circularization # # Circularize 'y' for each group # First cluster the groups a bit to move the # circles away from each other df_circ <- df %>% cluster_groups( cols = "y", group_cols = "g", suffix = "", overwrite = TRUE ) %>% dplyr::group_by(g) %>% circularize( y_col = "y", overwrite = TRUE ) # Plot circles if (has_ggplot){ df_circ %>% ggplot(aes(x = .circle_x, y = y, color = g)) + geom_point() + theme_minimal() } # # Specifying minimum value # # Specify minimum value manually df_circ <- circularize(df, y_col = "y", .min = -2) df_circ # Plot circle if (has_ggplot){ df_circ %>% ggplot(aes(x = .circle_x, y = y, color = .degrees)) + geom_point() + theme_minimal() } # # Multiple circles by contraction # # Start by circularizing 'y' df_circ <- circularize(df, y_col = "y") # Contract '.circle_x' and 'y' towards the centroid # To contract with multiple multipliers at once, # we wrap the call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = 1:10 / 10, .f = function(mult) { expand_distances( data = df_circ, cols = c(".circle_x", "y"), multiplier = mult, origin_fn = centroid, overwrite = TRUE ) } ) df_expanded if (has_ggplot){ df_expanded %>% ggplot(aes( x = .circle_x_expanded, y = y_expanded, color = .degrees, alpha = .multiplier )) + geom_point() + theme_minimal() }
Values are ordered by how close they are to the origin.
In 1d (when
`cols`
has length 1
), the origin can be thought of as a target value.
In n dimensions, the origin can be thought of as coordinates.
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and ordering the rows by
their distance to the centroid of each group.
The *_vec()
version takes and returns a vector
.
Example:
The column values:
c(1, 2, 3, 4, 5)
and origin = 2
are ordered as:
c(
2
, 1, 3, 4, 5)
closest_to( data, cols = NULL, origin = NULL, origin_fn = NULL, shuffle_ties = FALSE, origin_col_name = ".origin", distance_col_name = ".distance", overwrite = FALSE ) closest_to_vec(data, origin = NULL, origin_fn = NULL, shuffle_ties = FALSE)
closest_to( data, cols = NULL, origin = NULL, origin_fn = NULL, shuffle_ties = FALSE, origin_col_name = ".origin", distance_col_name = ".distance", overwrite = FALSE ) closest_to_vec(data, origin = NULL, origin_fn = NULL, shuffle_ties = FALSE)
data |
|
cols |
Column(s) to create sorting factor by.
When |
origin |
Coordinates of the origin to calculate distances to.
A scalar to use in all dimensions
or a N.B. Ignored when |
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
shuffle_ties |
Whether to shuffle elements with the same distance to the origin. (Logical) |
origin_col_name |
Name of new column with the origin coordinates. If |
distance_col_name |
Name of new column with the distances to the origin. If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
The sorted data.frame
(tibble
) / vector
.
Ludvig Renbo Olsen, [email protected]
Other rearrange functions:
center_max()
,
center_min()
,
furthest_from()
,
pair_extremes()
,
position_max()
,
position_min()
,
rev_windows()
,
roll_elements()
,
shuffle_hierarchy()
,
triplet_extremes()
Other distance functions:
dim_values()
,
distance()
,
expand_distances()
,
expand_distances_each()
,
furthest_from()
,
swirl_2d()
,
swirl_3d()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Closest to 3 in a vector closest_to_vec(1:10, origin = 3) # Closest to the third row (index of data.frame) closest_to(df, origin = 3)$index # By each of the columns closest_to(df, cols = "A", origin = 3)$A closest_to(df, cols = "A", origin_fn = most_centered)$A closest_to(df, cols = "B", origin = 0.5)$B closest_to(df, cols = "B", origin_fn = centroid)$B # Shuffle the elements with the same distance to the origin closest_to(df, cols = "A", origin_fn = create_origin_fn(median), shuffle_ties = TRUE )$A # Grouped by G df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% closest_to( cols = "A", origin_fn = create_origin_fn(median) ) # Plot the rearranged values plot( x = 1:10, y = closest_to(df, cols = "B", origin_fn = create_origin_fn(median) )$B, xlab = "Position", ylab = "B" ) plot( x = 1:10, y = closest_to(df, cols = "A", origin_fn = create_origin_fn(median), shuffle_ties = TRUE )$A, xlab = "Position", ylab = "A" ) # In multiple dimensions df %>% closest_to(cols = c("A", "B"), origin_fn = most_centered)
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Closest to 3 in a vector closest_to_vec(1:10, origin = 3) # Closest to the third row (index of data.frame) closest_to(df, origin = 3)$index # By each of the columns closest_to(df, cols = "A", origin = 3)$A closest_to(df, cols = "A", origin_fn = most_centered)$A closest_to(df, cols = "B", origin = 0.5)$B closest_to(df, cols = "B", origin_fn = centroid)$B # Shuffle the elements with the same distance to the origin closest_to(df, cols = "A", origin_fn = create_origin_fn(median), shuffle_ties = TRUE )$A # Grouped by G df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% closest_to( cols = "A", origin_fn = create_origin_fn(median) ) # Plot the rearranged values plot( x = 1:10, y = closest_to(df, cols = "B", origin_fn = create_origin_fn(median) )$B, xlab = "Position", ylab = "B" ) plot( x = 1:10, y = closest_to(df, cols = "A", origin_fn = create_origin_fn(median), shuffle_ties = TRUE )$A, xlab = "Position", ylab = "A" ) # In multiple dimensions df %>% closest_to(cols = c("A", "B"), origin_fn = most_centered)
Transform values such that the elements in each group move closer to their centroid.
cluster_groups( data, cols, group_cols = NULL, scale_min_fn = function(x) { quantile(x, 0.025) }, scale_max_fn = function(x) { quantile(x, 0.975) }, keep_centroids = FALSE, multiplier = 0.05, suffix = "_clustered", keep_original = TRUE, overwrite = FALSE )
cluster_groups( data, cols, group_cols = NULL, scale_min_fn = function(x) { quantile(x, 0.025) }, scale_max_fn = function(x) { quantile(x, 0.975) }, keep_centroids = FALSE, multiplier = 0.05, suffix = "_clustered", keep_original = TRUE, overwrite = FALSE )
data |
|
cols |
Names of columns in |
group_cols |
Names of grouping columns in If |
scale_min_fn , scale_max_fn
|
Function to find the minimum/maximum value in the original data when rescaling the contracted data. Input: A Output: A |
keep_centroids |
Whether to ensure the clusters have their original centroids. (Logical) |
multiplier |
Numeric constant to multiply the distance to the group centroid by. A smaller value makes the clusters more compact and vice versa. |
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
Contracts the distance from each data point to the centroid of its group.
Performs MinMax scaling such that the scale of the data points is similar to the original data.
If enabled (not default), the centroids are moved to the original centroids.
data.frame
(tibble
) with the clustered columns.
Ludvig Renbo Olsen, [email protected]
Other mutate functions:
apply_transformation_matrix()
,
dim_values()
,
expand_distances()
,
expand_distances_each()
,
flip_values()
,
roll_values()
,
rotate_2d()
,
rotate_3d()
,
shear_2d()
,
shear_3d()
,
swirl_2d()
,
swirl_3d()
Other clustering functions:
generate_clusters()
,
transfer_centroids()
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(2) # Create a data frame df <- data.frame( "x" = runif(50), "y" = runif(50), "z" = runif(50), "g" = rep(c(1, 2, 3, 4, 5), each = 10) ) # Move the data points into clusters cluster_groups(df, cols = c("x", "y"), group_col = "g" ) cluster_groups(df, cols = c("x", "y"), group_col = "g", multiplier = 0.1 ) cluster_groups(df, cols = c("x"), group_col = "g", multiplier = 0.1 ) # # Plotting clusters # # Cluster x and y for each group in g df_clustered <- cluster_groups( data = df, cols = c("x", "y"), group_col = "g" ) # Plot the clusters over the original data points # As we work with random data, the cluster might overlap if (has_ggplot){ ggplot( df_clustered, aes(x = x_clustered, y = y_clustered, color = factor(g)) ) + # Original data geom_point(aes(x = x, y = y), alpha = 0.3, size = 0.8) + # Clustered data geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "g") } # # Maintain original group centroids # df_clustered <- cluster_groups( data = df, cols = c("x", "y"), group_col = "g", keep_centroids = TRUE ) # Plot the clusters over the original data points # As we work with random data, the cluster might overlap if (has_ggplot){ ggplot( df_clustered, aes(x = x_clustered, y = y_clustered, color = factor(g)) ) + # Original data geom_point(aes(x = x, y = y), alpha = 0.3, size = 0.8) + # Clustered data geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "g") } # # Three dimensions # # Cluster in 3d df_clustered <- cluster_groups( data = df, cols = c("x", "y", "z"), group_col = "g" ) ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_clustered$x_clustered, y = df_clustered$y_clustered, z = df_clustered$z_clustered, type = "scatter3d", mode = "markers", color = df_clustered$g ) ## End(Not run)
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(2) # Create a data frame df <- data.frame( "x" = runif(50), "y" = runif(50), "z" = runif(50), "g" = rep(c(1, 2, 3, 4, 5), each = 10) ) # Move the data points into clusters cluster_groups(df, cols = c("x", "y"), group_col = "g" ) cluster_groups(df, cols = c("x", "y"), group_col = "g", multiplier = 0.1 ) cluster_groups(df, cols = c("x"), group_col = "g", multiplier = 0.1 ) # # Plotting clusters # # Cluster x and y for each group in g df_clustered <- cluster_groups( data = df, cols = c("x", "y"), group_col = "g" ) # Plot the clusters over the original data points # As we work with random data, the cluster might overlap if (has_ggplot){ ggplot( df_clustered, aes(x = x_clustered, y = y_clustered, color = factor(g)) ) + # Original data geom_point(aes(x = x, y = y), alpha = 0.3, size = 0.8) + # Clustered data geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "g") } # # Maintain original group centroids # df_clustered <- cluster_groups( data = df, cols = c("x", "y"), group_col = "g", keep_centroids = TRUE ) # Plot the clusters over the original data points # As we work with random data, the cluster might overlap if (has_ggplot){ ggplot( df_clustered, aes(x = x_clustered, y = y_clustered, color = factor(g)) ) + # Original data geom_point(aes(x = x, y = y), alpha = 0.3, size = 0.8) + # Clustered data geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "g") } # # Three dimensions # # Cluster in 3d df_clustered <- cluster_groups( data = df, cols = c("x", "y", "z"), group_col = "g" ) ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_clustered$x_clustered, y = df_clustered$y_clustered, z = df_clustered$z_clustered, type = "scatter3d", mode = "markers", color = df_clustered$g ) ## End(Not run)
Creates a function that takes 2 inputs (`x`
, `d`
) and performs the operation:
Here, `x`
is the current value and `d`
is its distance to an origin.
The greater the distance, the more we will dim the value of `x`
.
With the default values, the returned function is:
function(x, d){
x * (1 / ((1 + d) ^ 2))
}
create_dimming_fn(numerator = 1, exponent = 2, add_to_distance = 1)
create_dimming_fn(numerator = 1, exponent = 2, add_to_distance = 1)
numerator |
The numerator. Defaults to |
exponent |
The exponent. Defaults to |
add_to_distance |
Constant to add to the distance before exponentiation.
Ensures dimming even when the distance ( |
Function with the arguments x
and d
,
with both expected to be numeric vector
s. More specifically:
function(x, d){
x * (numerator / ((add_to_distance + d) ^ exponent))
}
Ludvig Renbo Olsen, [email protected]
Other function creators:
create_n_fn()
,
create_origin_fn()
# Attach packages library(rearrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create two vectors x <- runif(10) d <- runif(10, max = 0.5) # Create dimming_fn with an add_to_distance of 0 # Note: In practice this risks zero-division non_smoothed_dimming_fn <- create_dimming_fn(add_to_distance = 0) non_smoothed_dimming_fn as.list(environment(non_smoothed_dimming_fn)) # Use median_origin_fn non_smoothed_dimming_fn(x, d) # Plotting the dimming # Create data.frame with distance-based dimming df <- data.frame( "x" = 1, "d" = 1:10 ) df$x_dimmed <- non_smoothed_dimming_fn(df$x, df$d) # Plot the dimming if (has_ggplot){ ggplot(df, aes(x=d, y=x_dimmed)) + geom_point() + geom_line() + theme_minimal() }
# Attach packages library(rearrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create two vectors x <- runif(10) d <- runif(10, max = 0.5) # Create dimming_fn with an add_to_distance of 0 # Note: In practice this risks zero-division non_smoothed_dimming_fn <- create_dimming_fn(add_to_distance = 0) non_smoothed_dimming_fn as.list(environment(non_smoothed_dimming_fn)) # Use median_origin_fn non_smoothed_dimming_fn(x, d) # Plotting the dimming # Create data.frame with distance-based dimming df <- data.frame( "x" = 1, "d" = 1:10 ) df$x_dimmed <- non_smoothed_dimming_fn(df$x, df$d) # Plot the dimming if (has_ggplot){ ggplot(df, aes(x=d, y=x_dimmed)) + geom_point() + geom_line() + theme_minimal() }
Creates a function that applies a supplied function to all input vectors, or their indices, and rounds the results.
As used with roll_elements()
. E.g. to
find the the median index in a subset of a grouped data.frame
.
create_n_fn(fn, use_index = FALSE, negate = FALSE, round_fn = round, ...)
create_n_fn(fn, use_index = FALSE, negate = FALSE, round_fn = round, ...)
fn |
Function to apply to each dimension. Should return a numeric scalar. |
use_index |
Whether to apply The indices are created with |
negate |
Whether to negate the result. I.e. to multiply it with |
round_fn |
Function for rounding results of Rounding is done prior to negation. E.g. To avoid rounding, supply |
... |
Arguments for |
Function with the dots (`...`
) argument
that applies the `fn`
function to
each element in `...`
(or indices thereof) (usually one vector per dimension).
The results are rounded with `round_fn`
.
Note: The dots argument in the generated function should not to be confused with the dots
argument in create_n_fn()
).
Ludvig Renbo Olsen, [email protected]
Other n functions:
median_index()
Other function creators:
create_dimming_fn()
,
create_origin_fn()
# Attach packages library(rearrr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(10) z <- runif(10) # Create n_fn that gets the median index # and rounds down with floor() median_index_fn <- create_n_fn(median, use_index = TRUE, round_fn = floor) # Use median_index_fn median_index_fn(x, y, z) # Create n_fn that gets the median of each dimension median_n_fn <- create_n_fn(median) # Use median_origin_fn median_n_fn(x, y, z) # Should be the same as round(c(median(x), median(y), median(z))) # Use mean and ignore missing values mean_n_fn <- create_n_fn(mean, na.rm = TRUE) # Add missing values x[[2]] <- NA y[[5]] <- NA # Use mean_n_fn mean_n_fn(x, y, z) # Should be the same as round(c( mean(x, na.rm = TRUE), mean(y, na.rm = TRUE), mean(z, na.rm = TRUE) ))
# Attach packages library(rearrr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(10) z <- runif(10) # Create n_fn that gets the median index # and rounds down with floor() median_index_fn <- create_n_fn(median, use_index = TRUE, round_fn = floor) # Use median_index_fn median_index_fn(x, y, z) # Create n_fn that gets the median of each dimension median_n_fn <- create_n_fn(median) # Use median_origin_fn median_n_fn(x, y, z) # Should be the same as round(c(median(x), median(y), median(z))) # Use mean and ignore missing values mean_n_fn <- create_n_fn(mean, na.rm = TRUE) # Add missing values x[[2]] <- NA y[[5]] <- NA # Use mean_n_fn mean_n_fn(x, y, z) # Should be the same as round(c( mean(x, na.rm = TRUE), mean(y, na.rm = TRUE), mean(z, na.rm = TRUE) ))
Creates a function that applies a supplied function to all input vectors.
create_origin_fn(fn, ...)
create_origin_fn(fn, ...)
fn |
Function to apply to each dimension. Should return a numeric scalar. |
... |
Arguments for |
Function with the dots (...
) argument that applies the `fn`
function to
each element in ...
(usually one vector per dimension).
Ludvig Renbo Olsen, [email protected]
Other coordinate functions:
centroid()
,
is_most_centered()
,
midrange()
,
most_centered()
Other function creators:
create_dimming_fn()
,
create_n_fn()
# Attach packages library(rearrr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(10) z <- runif(10) # Create origin_fn that gets the median of each dimension median_origin_fn <- create_origin_fn(median) # Use median_origin_fn median_origin_fn(x, y, z) # Should be the same as c(median(x), median(y), median(z)) # Use mean and ignore missing values mean_origin_fn <- create_origin_fn(mean, na.rm = TRUE) # Add missing values x[[2]] <- NA y[[5]] <- NA # Use mean_origin_fn mean_origin_fn(x, y, z) # Should be the same as c(mean(x, na.rm = TRUE), mean(y, na.rm = TRUE), mean(z, na.rm = TRUE) )
# Attach packages library(rearrr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(10) z <- runif(10) # Create origin_fn that gets the median of each dimension median_origin_fn <- create_origin_fn(median) # Use median_origin_fn median_origin_fn(x, y, z) # Should be the same as c(median(x), median(y), median(z)) # Use mean and ignore missing values mean_origin_fn <- create_origin_fn(mean, na.rm = TRUE) # Add missing values x[[2]] <- NA y[[5]] <- NA # Use mean_origin_fn mean_origin_fn(x, y, z) # Should be the same as c(mean(x, na.rm = TRUE), mean(y, na.rm = TRUE), mean(z, na.rm = TRUE) )
Convert degrees to radians or radians to degrees.
degrees_to_radians(degrees) radians_to_degrees(radians)
degrees_to_radians(degrees) radians_to_degrees(radians)
degrees |
|
radians |
|
vector
with converted degrees/radians.
Missing values (NA
s) are returned as they are.
Ludvig Renbo Olsen, [email protected]
# Attach packages library(rearrr) library(dplyr) # Degrees to radians degrees_to_radians(c(90, 180, 270)) # Radians to degrees radians_to_degrees(c(pi / 2, pi, 1.5 * pi)) # Get back the original degrees radians_to_degrees(degrees_to_radians(c(90, 180, 270)))
# Attach packages library(rearrr) library(dplyr) # Degrees to radians degrees_to_radians(c(90, 180, 270)) # Radians to degrees radians_to_degrees(c(pi / 2, pi, 1.5 * pi)) # Get back the original degrees radians_to_degrees(degrees_to_radians(c(90, 180, 270)))
Dims the values in the dimming dimension (last by default) based on the data point's distance to the origin.
Distance is calculated as:
The default `dimming_fn`
multiplies by the inverse-square of
and is calculated as:
Where is the value in the dimming dimension. The
is added
to ensure that values are dimmed even when the distance is below
1
. The quickest
way to change the exponent or the is with
create_dimming_fn()
.
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and dimming around e.g. the centroid
of each group.
dim_values( data, cols, dimming_fn = create_dimming_fn(numerator = 1, exponent = 2, add_to_distance = 1), origin = NULL, origin_fn = NULL, dim_col = tail(cols, 1), suffix = "_dimmed", keep_original = TRUE, origin_col_name = ".origin", overwrite = FALSE )
dim_values( data, cols, dimming_fn = create_dimming_fn(numerator = 1, exponent = 2, add_to_distance = 1), origin = NULL, origin_fn = NULL, dim_col = tail(cols, 1), suffix = "_dimmed", keep_original = TRUE, origin_col_name = ".origin", overwrite = FALSE )
data |
|
cols |
Names of columns in N.B. when the dimming dimension is included in |
dimming_fn |
Input: Two (2) input arguments:
Output: A E.g.:
This kind of dimming function can be created with
|
origin |
Coordinates of the origin to dim around.
A scalar to use in all dimensions
or a N.B. Ignored when |
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
dim_col |
Name of column to dim. Default is the last column in When the |
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
origin_col_name |
Name of new column with the origin coordinates. If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
Calculates distances to origin with:
Applies the `dimming_fn`
to the `dim_col`
based on the distances.
data.frame
(tibble
) with the dimmed column,
along with the origin coordinates.
Ludvig Renbo Olsen, [email protected]
Other mutate functions:
apply_transformation_matrix()
,
cluster_groups()
,
expand_distances()
,
expand_distances_each()
,
flip_values()
,
roll_values()
,
rotate_2d()
,
rotate_3d()
,
shear_2d()
,
shear_3d()
,
swirl_2d()
,
swirl_3d()
Other distance functions:
closest_to()
,
distance()
,
expand_distances()
,
expand_distances_each()
,
furthest_from()
,
swirl_2d()
,
swirl_3d()
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(7) # Create a data frame with clusters df <- generate_clusters( num_rows = 70, num_cols = 3, num_clusters = 5, compactness = 1.6 ) %>% dplyr::rename(x = D1, y = D2, z = D3) %>% dplyr::mutate(o = 1) # Dim the values in the z column dim_values( data = df, cols = c("x", "y", "z"), origin = c(0.5, 0.5, 0.5) ) # Dim the values in the `o` column (all 1s) # around the centroid dim_values( data = df, cols = c("x", "y"), dim_col = "o", origin_fn = centroid ) # Specify dimming_fn # around the centroid dim_values( data = df, cols = c("x", "y"), dim_col = "o", origin_fn = centroid, dimming_fn = function(x, d) { x * 1 / (2^(1 + d)) } ) # # Dim cluster-wise # # Group-wise dimming df_dimmed <- df %>% dplyr::group_by(.cluster) %>% dim_values( cols = c("x", "y"), dim_col = "o", origin_fn = centroid ) # Plot the dimmed data such that the alpha (opacity) is # controlled by the dimming # (Note: This works because the `o` column is 1 for all values) if (has_ggplot){ ggplot( data = df_dimmed, aes(x = x, y = y, alpha = o_dimmed, color = .cluster) ) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "Cluster", alpha = "o_dimmed") }
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(7) # Create a data frame with clusters df <- generate_clusters( num_rows = 70, num_cols = 3, num_clusters = 5, compactness = 1.6 ) %>% dplyr::rename(x = D1, y = D2, z = D3) %>% dplyr::mutate(o = 1) # Dim the values in the z column dim_values( data = df, cols = c("x", "y", "z"), origin = c(0.5, 0.5, 0.5) ) # Dim the values in the `o` column (all 1s) # around the centroid dim_values( data = df, cols = c("x", "y"), dim_col = "o", origin_fn = centroid ) # Specify dimming_fn # around the centroid dim_values( data = df, cols = c("x", "y"), dim_col = "o", origin_fn = centroid, dimming_fn = function(x, d) { x * 1 / (2^(1 + d)) } ) # # Dim cluster-wise # # Group-wise dimming df_dimmed <- df %>% dplyr::group_by(.cluster) %>% dim_values( cols = c("x", "y"), dim_col = "o", origin_fn = centroid ) # Plot the dimmed data such that the alpha (opacity) is # controlled by the dimming # (Note: This works because the `o` column is 1 for all values) if (has_ggplot){ ggplot( data = df_dimmed, aes(x = x, y = y, alpha = o_dimmed, color = .cluster) ) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "Cluster", alpha = "o_dimmed") }
Calculates the distance to the specified origin with:
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and finding the distance to e.g. the centroid
of each group.
distance( data, cols = NULL, origin = NULL, origin_fn = NULL, distance_col_name = ".distance", origin_col_name = ".origin", overwrite = FALSE )
distance( data, cols = NULL, origin = NULL, origin_fn = NULL, distance_col_name = ".distance", origin_col_name = ".origin", overwrite = FALSE )
data |
|
cols |
Names of columns in |
origin |
Coordinates of the origin to calculate distances to.
A scalar to use in all dimensions
or a N.B. Ignored when |
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
distance_col_name |
Name of new column with the distances. |
origin_col_name |
Name of new column with the origin coordinates. If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
data.frame
(tibble
) with the additional columns (distances and origin coordinates).
Ludvig Renbo Olsen, [email protected]
Other measuring functions:
angle()
,
vector_length()
Other distance functions:
closest_to()
,
dim_values()
,
expand_distances()
,
expand_distances_each()
,
furthest_from()
,
swirl_2d()
,
swirl_3d()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Calculate distances in the two dimensions (x and y) # With the origin at x=0.5, y=0.5 distance( data = df, cols = c("x", "y"), origin = c(0.5, 0.5) ) # Calculate distances to the centroid for each group in 'g' distance( data = dplyr::group_by(df, g), cols = c("x", "y"), origin_fn = centroid )
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Calculate distances in the two dimensions (x and y) # With the origin at x=0.5, y=0.5 distance( data = df, cols = c("x", "y"), origin = c(0.5, 0.5) ) # Calculate distances to the centroid for each group in 'g' distance( data = dplyr::group_by(df, g), cols = c("x", "y"), origin_fn = centroid )
Moves the data points in n-dimensional space such that their distance
to a specified origin is increased/decreased.
A `multiplier`
greater than 1 leads to expansion,
while a positive `multiplier`
lower than 1 leads to contraction.
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and expanding around e.g. the centroid
of each group.
The multiplier/exponent can be supplied as a constant or as a function that returns a constant.
The latter can be useful when supplying a grouped data.frame
and the multiplier/exponent depends
on the data in the groups.
For expansion in each dimension separately, use expand_distances_each()
.
NOTE: When exponentiating, the default is to first add 1
to the distances,
to ensure expansion even when the distance is between 0
and 1
.
If you need the purely exponentiated distances,
disable `add_one_exp`
.
expand_distances( data, cols = NULL, multiplier = NULL, multiplier_fn = NULL, origin = NULL, origin_fn = NULL, exponentiate = FALSE, add_one_exp = TRUE, suffix = "_expanded", keep_original = TRUE, mult_col_name = ifelse(isTRUE(exponentiate), ".exponent", ".multiplier"), origin_col_name = ".origin", overwrite = FALSE )
expand_distances( data, cols = NULL, multiplier = NULL, multiplier_fn = NULL, origin = NULL, origin_fn = NULL, exponentiate = FALSE, add_one_exp = TRUE, suffix = "_expanded", keep_original = TRUE, mult_col_name = ifelse(isTRUE(exponentiate), ".exponent", ".multiplier"), origin_col_name = ".origin", overwrite = FALSE )
data |
|
cols |
Names of columns in |
multiplier |
Constant to multiply/exponentiate the distances to the origin by. N.B. When |
multiplier_fn |
Function for finding the Input: Each column will be passed as a Output: A |
origin |
Coordinates of the origin to expand around.
A scalar to use in all dimensions
or a N.B. Ignored when |
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
exponentiate |
Whether to exponentiate instead of multiplying. (Logical) |
add_one_exp |
Whether to add The distances to the origin (
N.B. Ignored when |
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
mult_col_name |
Name of new column with the |
origin_col_name |
Name of new column with the origin coordinates.
If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
Increases the distance to the origin in n-dimensional space by multiplying or exponentiating it by the multiplier.
We first move the origin to the zero-coordinates (e.g. c(0, 0, 0)
)
and normalize each vector to unit length. We then multiply this unit vector by the
multiplied/exponentiated distance and moves the origin back to its original coordinates.
The distance to the specified origin is calculated with:
Note: By default (when `add_one_exp`
is TRUE
),
we add 1
to the distance before the exponentiation
and subtract it afterwards. See `add_one_exp`
.
data.frame
(tibble
) with the expanded columns,
along with the applied multiplier/exponent and origin coordinates.
Ludvig Renbo Olsen, [email protected]
Other mutate functions:
apply_transformation_matrix()
,
cluster_groups()
,
dim_values()
,
expand_distances_each()
,
flip_values()
,
roll_values()
,
rotate_2d()
,
rotate_3d()
,
shear_2d()
,
shear_3d()
,
swirl_2d()
,
swirl_3d()
Other expander functions:
expand_distances_each()
Other distance functions:
closest_to()
,
dim_values()
,
distance()
,
expand_distances_each()
,
furthest_from()
,
swirl_2d()
,
swirl_3d()
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Expand distances in the two dimensions (x and y) # With the origin at x=0.5, y=0.5 # We multiply the distances by 2 expand_distances( data = df, cols = c("x", "y"), multiplier = 2, origin = c(0.5, 0.5) ) # Expand distances in the two dimensions (x and y) # With the origin at x=0.5, y=0.5 # We exponentiate the distances by 2 expand_distances( data = df, cols = c("x", "y"), multiplier = 2, exponentiate = TRUE, origin = 0.5 ) # Expand values in one dimension (x) # With the origin at x=0.5 # We exponentiate the distances by 3 expand_distances( data = df, cols = c("x"), multiplier = 3, exponentiate = TRUE, origin = 0.5 ) # Expand x and y around the centroid # We use exponentiation for a more drastic effect # The add_one_exp makes sure it expands # even when x or y is in the range [0, <1] # To compare multiple exponents, we wrap the # call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = c(1, 3, 5), .f = function(exponent) { expand_distances( data = df, cols = c("x", "y"), multiplier = exponent, origin_fn = centroid, exponentiate = TRUE, add_one_exp = TRUE ) } ) df_expanded # Plot the expansions of x and y around the overall centroid if (has_ggplot){ ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.exponent))) + geom_vline( xintercept = df_expanded[[".origin"]][[1]][[1]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_hline( yintercept = df_expanded[[".origin"]][[1]][[2]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_path(size = 0.2) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "Exponent") } # Expand x and y around the centroid using multiplication # To compare multiple multipliers, we wrap the # call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = c(1, 3, 5), .f = function(multiplier) { expand_distances(df, cols = c("x", "y"), multiplier = multiplier, origin_fn = centroid, exponentiate = FALSE ) } ) df_expanded # Plot the expansions of x and y around the overall centroid if (has_ggplot){ ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.multiplier))) + geom_vline( xintercept = df_expanded[[".origin"]][[1]][[1]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_hline( yintercept = df_expanded[[".origin"]][[1]][[2]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_path(size = 0.2, alpha = .8) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "Multiplier") } # # Contraction # # Group-wise contraction to create clusters df_contracted <- df %>% dplyr::group_by(g) %>% expand_distances( cols = c("x", "y"), multiplier = 0.07, suffix = "_contracted", origin_fn = centroid ) # Plot the clustered data point on top of the original data points if (has_ggplot){ ggplot(df_contracted, aes(x = x_contracted, y = y_contracted, color = factor(g))) + geom_point(aes(x = x, y = y, color = factor(g)), alpha = 0.3, shape = 16) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "g") }
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Expand distances in the two dimensions (x and y) # With the origin at x=0.5, y=0.5 # We multiply the distances by 2 expand_distances( data = df, cols = c("x", "y"), multiplier = 2, origin = c(0.5, 0.5) ) # Expand distances in the two dimensions (x and y) # With the origin at x=0.5, y=0.5 # We exponentiate the distances by 2 expand_distances( data = df, cols = c("x", "y"), multiplier = 2, exponentiate = TRUE, origin = 0.5 ) # Expand values in one dimension (x) # With the origin at x=0.5 # We exponentiate the distances by 3 expand_distances( data = df, cols = c("x"), multiplier = 3, exponentiate = TRUE, origin = 0.5 ) # Expand x and y around the centroid # We use exponentiation for a more drastic effect # The add_one_exp makes sure it expands # even when x or y is in the range [0, <1] # To compare multiple exponents, we wrap the # call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = c(1, 3, 5), .f = function(exponent) { expand_distances( data = df, cols = c("x", "y"), multiplier = exponent, origin_fn = centroid, exponentiate = TRUE, add_one_exp = TRUE ) } ) df_expanded # Plot the expansions of x and y around the overall centroid if (has_ggplot){ ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.exponent))) + geom_vline( xintercept = df_expanded[[".origin"]][[1]][[1]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_hline( yintercept = df_expanded[[".origin"]][[1]][[2]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_path(size = 0.2) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "Exponent") } # Expand x and y around the centroid using multiplication # To compare multiple multipliers, we wrap the # call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = c(1, 3, 5), .f = function(multiplier) { expand_distances(df, cols = c("x", "y"), multiplier = multiplier, origin_fn = centroid, exponentiate = FALSE ) } ) df_expanded # Plot the expansions of x and y around the overall centroid if (has_ggplot){ ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.multiplier))) + geom_vline( xintercept = df_expanded[[".origin"]][[1]][[1]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_hline( yintercept = df_expanded[[".origin"]][[1]][[2]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_path(size = 0.2, alpha = .8) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "Multiplier") } # # Contraction # # Group-wise contraction to create clusters df_contracted <- df %>% dplyr::group_by(g) %>% expand_distances( cols = c("x", "y"), multiplier = 0.07, suffix = "_contracted", origin_fn = centroid ) # Plot the clustered data point on top of the original data points if (has_ggplot){ ggplot(df_contracted, aes(x = x_contracted, y = y_contracted, color = factor(g))) + geom_point(aes(x = x, y = y, color = factor(g)), alpha = 0.3, shape = 16) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "g") }
Moves the data points in n-dimensional space such that their distance
to the specified origin is increased/decreased in each dimension separately.
A `multiplier`
greater than 1 leads to expansion,
while a positive `multiplier`
lower than 1 leads to contraction.
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and expanding around e.g. the centroid
of each group.
The multipliers/exponents can be supplied as constant(s) or as a function that returns constants.
The latter can be useful when supplying a grouped data.frame
and the multiplier/exponent depends
on the data in the groups.
If supplying multiple constants, there must be one per dimension (length of `cols`
).
For expansion of the multidimensional distance, use expand_distances()
.
NOTE: When exponentiating, the default is to first add 1
or -1
(depending on the sign of the distance) to the distances,
to ensure expansion even when the distance is between -1
and 1
.
If you need the purely exponentiated distances,
disable `add_one_exp`
.
expand_distances_each( data, cols = NULL, multipliers = NULL, multipliers_fn = NULL, origin = NULL, origin_fn = NULL, exponentiate = FALSE, add_one_exp = TRUE, suffix = "_expanded", keep_original = TRUE, mult_col_name = ifelse(isTRUE(exponentiate), ".exponents", ".multipliers"), origin_col_name = ".origin", overwrite = FALSE )
expand_distances_each( data, cols = NULL, multipliers = NULL, multipliers_fn = NULL, origin = NULL, origin_fn = NULL, exponentiate = FALSE, add_one_exp = TRUE, suffix = "_expanded", keep_original = TRUE, mult_col_name = ifelse(isTRUE(exponentiate), ".exponents", ".multipliers"), origin_col_name = ".origin", overwrite = FALSE )
data |
|
cols |
Names of columns in |
multipliers |
Constant(s) to multiply/exponentiate the distance to the origin by.
A scalar to use in all dimensions or
a N.B. When |
multipliers_fn |
Function for finding the Input: Each column will be passed as a Output: A Just as for |
origin |
Coordinates of the origin to expand around.
A scalar to use in all dimensions
or a N.B. Ignored when |
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
exponentiate |
Whether to exponentiate instead of multiplying. (Logical) |
add_one_exp |
Whether to add the Exponentiation becomes:
N.B. Ignored when |
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
mult_col_name |
Name of new column with the multiplier(s). If |
origin_col_name |
Name of new column with the origin coordinates. If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
For each value of each dimension (column), either multiply or exponentiate by the multiplier:
# Multiplication
x <- x * multiplier
# Exponentiation
x <- sign(x) * abs(x) ^ multiplier
Note: By default (when `add_one_exp`
is TRUE
),
we add the sign (1 / -1
) of the value before the exponentiation
and subtract it afterwards. See `add_one_exp`
.
data.frame
(tibble
) with the expanded columns,
along with the applied multiplier/exponent and origin coordinates.
Ludvig Renbo Olsen, [email protected]
Other mutate functions:
apply_transformation_matrix()
,
cluster_groups()
,
dim_values()
,
expand_distances()
,
flip_values()
,
roll_values()
,
rotate_2d()
,
rotate_3d()
,
shear_2d()
,
shear_3d()
,
swirl_2d()
,
swirl_3d()
Other expander functions:
expand_distances()
Other distance functions:
closest_to()
,
dim_values()
,
distance()
,
expand_distances()
,
furthest_from()
,
swirl_2d()
,
swirl_3d()
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Expand values in the two dimensions (x and y) # With the origin at x=0.5, y=0.5 # We expand x by 2 and y by 4 expand_distances_each( data = df, cols = c("x", "y"), multipliers = c(2, 4), origin = c(0.5, 0.5) ) # Expand values in the two dimensions (x and y) # With the origin at x=0.5, y=0.5 # We expand both by 3 expand_distances_each( data = df, cols = c("x", "y"), multipliers = 3, origin = 0.5 ) # Expand values in one dimension (x) # With the origin at x=0.5 # We expand by 3 expand_distances_each( data = df, cols = c("x"), multipliers = 3, origin = 0.5 ) # Expand x and y around the centroid # We use exponentiation for a more drastic effect # The add_one_exp makes sure it expands # even when x or y is in the range [>-1, <1] # To compare multiple exponents, we wrap the # call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = c(1, 2.0, 3.0, 4.0), .f = function(exponent) { expand_distances_each( data = df, cols = c("x", "y"), multipliers = exponent, origin_fn = centroid, exponentiate = TRUE, add_one_exp = TRUE ) } ) df_expanded # Plot the expansions of x and y around the overall centroid if (has_ggplot){ ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.exponents))) + geom_vline( xintercept = df_expanded[[".origin"]][[1]][[1]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_hline( yintercept = df_expanded[[".origin"]][[1]][[2]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "Exponent") } # Expand x and y around the centroid using multiplication # To compare multiple multipliers, we wrap the # call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = c(1, 2.0, 3.0, 4.0), .f = function(multiplier) { expand_distances_each(df, cols = c("x", "y"), multipliers = multiplier, origin_fn = centroid, exponentiate = FALSE ) } ) df_expanded # Plot the expansions of x and y around the overall centroid if (has_ggplot){ ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.multipliers))) + geom_vline( xintercept = df_expanded[[".origin"]][[1]][[1]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_hline( yintercept = df_expanded[[".origin"]][[1]][[2]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "Multiplier") } # Expand x and y with different multipliers # around the centroid using multiplication df_expanded <- expand_distances_each( df, cols = c("x", "y"), multipliers = c(1.25, 10), origin_fn = centroid, exponentiate = FALSE ) df_expanded # Plot the expansions of x and y around the overall centroid # Note how the y axis is expanded a lot more than the x-axis if (has_ggplot){ ggplot(df_expanded, aes(x = x_expanded, y = y_expanded)) + geom_vline( xintercept = df_expanded[[".origin"]][[1]][[1]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_hline( yintercept = df_expanded[[".origin"]][[1]][[2]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_line(aes(color = "Expanded")) + geom_point(aes(color = "Expanded")) + geom_line(aes(x = x, y = y, color = "Original")) + geom_point(aes(x = x, y = y, color = "Original")) + theme_minimal() + labs(x = "x", y = "y", color = "Multiplier") } # # Contraction # # Group-wise contraction to create clusters df_contracted <- df %>% dplyr::group_by(g) %>% expand_distances_each( cols = c("x", "y"), multipliers = 0.07, suffix = "_contracted", origin_fn = centroid ) # Plot the clustered data point on top of the original data points if (has_ggplot){ ggplot(df_contracted, aes(x = x_contracted, y = y_contracted, color = factor(g))) + geom_point(aes(x = x, y = y, color = factor(g)), alpha = 0.3, shape = 16) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "g") }
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Expand values in the two dimensions (x and y) # With the origin at x=0.5, y=0.5 # We expand x by 2 and y by 4 expand_distances_each( data = df, cols = c("x", "y"), multipliers = c(2, 4), origin = c(0.5, 0.5) ) # Expand values in the two dimensions (x and y) # With the origin at x=0.5, y=0.5 # We expand both by 3 expand_distances_each( data = df, cols = c("x", "y"), multipliers = 3, origin = 0.5 ) # Expand values in one dimension (x) # With the origin at x=0.5 # We expand by 3 expand_distances_each( data = df, cols = c("x"), multipliers = 3, origin = 0.5 ) # Expand x and y around the centroid # We use exponentiation for a more drastic effect # The add_one_exp makes sure it expands # even when x or y is in the range [>-1, <1] # To compare multiple exponents, we wrap the # call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = c(1, 2.0, 3.0, 4.0), .f = function(exponent) { expand_distances_each( data = df, cols = c("x", "y"), multipliers = exponent, origin_fn = centroid, exponentiate = TRUE, add_one_exp = TRUE ) } ) df_expanded # Plot the expansions of x and y around the overall centroid if (has_ggplot){ ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.exponents))) + geom_vline( xintercept = df_expanded[[".origin"]][[1]][[1]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_hline( yintercept = df_expanded[[".origin"]][[1]][[2]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "Exponent") } # Expand x and y around the centroid using multiplication # To compare multiple multipliers, we wrap the # call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = c(1, 2.0, 3.0, 4.0), .f = function(multiplier) { expand_distances_each(df, cols = c("x", "y"), multipliers = multiplier, origin_fn = centroid, exponentiate = FALSE ) } ) df_expanded # Plot the expansions of x and y around the overall centroid if (has_ggplot){ ggplot(df_expanded, aes(x = x_expanded, y = y_expanded, color = factor(.multipliers))) + geom_vline( xintercept = df_expanded[[".origin"]][[1]][[1]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_hline( yintercept = df_expanded[[".origin"]][[1]][[2]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "Multiplier") } # Expand x and y with different multipliers # around the centroid using multiplication df_expanded <- expand_distances_each( df, cols = c("x", "y"), multipliers = c(1.25, 10), origin_fn = centroid, exponentiate = FALSE ) df_expanded # Plot the expansions of x and y around the overall centroid # Note how the y axis is expanded a lot more than the x-axis if (has_ggplot){ ggplot(df_expanded, aes(x = x_expanded, y = y_expanded)) + geom_vline( xintercept = df_expanded[[".origin"]][[1]][[1]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_hline( yintercept = df_expanded[[".origin"]][[1]][[2]], size = 0.2, alpha = .4, linetype = "dashed" ) + geom_line(aes(color = "Expanded")) + geom_point(aes(color = "Expanded")) + geom_line(aes(x = x, y = y, color = "Original")) + geom_point(aes(x = x, y = y, color = "Original")) + theme_minimal() + labs(x = "x", y = "y", color = "Multiplier") } # # Contraction # # Group-wise contraction to create clusters df_contracted <- df %>% dplyr::group_by(g) %>% expand_distances_each( cols = c("x", "y"), multipliers = 0.07, suffix = "_contracted", origin_fn = centroid ) # Plot the clustered data point on top of the original data points if (has_ggplot){ ggplot(df_contracted, aes(x = x_contracted, y = y_contracted, color = factor(g))) + geom_point(aes(x = x, y = y, color = factor(g)), alpha = 0.3, shape = 16) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "g") }
Build a pipeline of transformations to be applied sequentially.
Specify different argument values for each group in a fixed set of groups.
E.g. if your data.frame
contains 5 groups, you provide 5 argument values
for each of the non-constant arguments (see `var_args`
).
The number of expected groups is specified during initialization and the input
`data`
must be grouped such that it contains that exact number of groups.
Transformations are applied to groups separately, why the given transformation function
only receives the subset of `data`
belonging to the current group.
Standard workflow: Instantiate pipeline -> Add transformations -> Apply to data
To apply the same arguments to all groups, see
Pipeline
.
To apply generated argument values to an arbitrary number of groups,
see GeneratedPipeline
.
rearrr::Pipeline
-> FixedGroupsPipeline
transformations
list
of transformations to apply.
names
Names of the transformations.
num_groups
Number of groups the pipeline will be applied to.
new()
Initialize the pipeline with the number of groups the pipeline will be applied to.
FixedGroupsPipeline$new(num_groups)
num_groups
Number of groups the pipeline will be applied to.
add_transformation()
Add a transformation to the pipeline.
FixedGroupsPipeline$add_transformation(fn, args, var_args, name)
fn
Function that performs the transformation.
args
Named list
with arguments for the `fn`
function.
var_args
Named list
of arguments with list
of differing
values for each group.
E.g. list("a" = list(1, 2, 3), "b" = list("a", "b", "c"))
given 3 groups.
By adding ".apply"
with a list of TRUE
/FALSE
flags, the transformation
can be disabled for a specific group.
E.g. list(".apply" = list(TRUE, FALSE, TRUE), ...
.
name
Name of the transformation step. Must be unique.
The pipeline. To allow chaining of methods.
apply()
Apply the pipeline to a data.frame
.
FixedGroupsPipeline$apply(data, verbose = FALSE)
data
data.frame
with the same number of groups as pre-registered
in the pipeline.
You can find the number of groups in `data`
with `dplyr::n_groups(data)`
.
The number of groups expected by the pipeline can be accessed with `pipe$num_groups`
.
verbose
Whether to print the progress.
Transformed version of `data`
.
print()
Print an overview of the pipeline.
FixedGroupsPipeline$print(...)
...
further arguments passed to or from other methods.
The pipeline. To allow chaining of methods.
clone()
The objects of this class are cloneable with this method.
FixedGroupsPipeline$clone(deep = FALSE)
deep
Whether to make a deep clone.
Ludvig Renbo Olsen, [email protected]
Other pipelines:
GeneratedPipeline
,
Pipeline
# Attach package library(rearrr) library(dplyr) # Create a data frame # We group it by G so we have 3 groups df <- data.frame( "Index" = 1:12, "A" = c(1:4, 9:12, 15:18), "G" = rep(1:3, each = 4) ) %>% dplyr::group_by(G) # Create new pipeline pipe <- FixedGroupsPipeline$new(num_groups = 3) # Add 2D rotation transformation pipe$add_transformation( fn = rotate_2d, args = list( x_col = "Index", y_col = "A", suffix = "", overwrite = TRUE ), var_args = list( degrees = list(45, 90, 180), origin = list(c(0, 0), c(1, 2), c(-1, 0)) ), name = "Rotate" ) # Add the `cluster_group` transformation # As the function is fed an ungrouped subset of `data`, # i.e. the rows of that group, we need to specify `group_cols` in `args` # That is specific to `cluster_groups()` though # Also note `.apply` in `var_args` which tells the pipeline *not* # to apply this transformation to the second group pipe$add_transformation( fn = cluster_groups, args = list( cols = c("Index", "A"), suffix = "", overwrite = TRUE, group_cols = "G" ), var_args = list( multiplier = list(0.5, 1, 5), .apply = list(TRUE, FALSE, TRUE) ), name = "Cluster" ) # Check pipeline object pipe # Apply pipeline to already grouped data.frame # Enable `verbose` to print progress pipe$apply(df, verbose = TRUE)
# Attach package library(rearrr) library(dplyr) # Create a data frame # We group it by G so we have 3 groups df <- data.frame( "Index" = 1:12, "A" = c(1:4, 9:12, 15:18), "G" = rep(1:3, each = 4) ) %>% dplyr::group_by(G) # Create new pipeline pipe <- FixedGroupsPipeline$new(num_groups = 3) # Add 2D rotation transformation pipe$add_transformation( fn = rotate_2d, args = list( x_col = "Index", y_col = "A", suffix = "", overwrite = TRUE ), var_args = list( degrees = list(45, 90, 180), origin = list(c(0, 0), c(1, 2), c(-1, 0)) ), name = "Rotate" ) # Add the `cluster_group` transformation # As the function is fed an ungrouped subset of `data`, # i.e. the rows of that group, we need to specify `group_cols` in `args` # That is specific to `cluster_groups()` though # Also note `.apply` in `var_args` which tells the pipeline *not* # to apply this transformation to the second group pipe$add_transformation( fn = cluster_groups, args = list( cols = c("Index", "A"), suffix = "", overwrite = TRUE, group_cols = "G" ), var_args = list( multiplier = list(0.5, 1, 5), .apply = list(TRUE, FALSE, TRUE) ), name = "Cluster" ) # Check pipeline object pipe # Apply pipeline to already grouped data.frame # Enable `verbose` to print progress pipe$apply(df, verbose = TRUE)
Container for the type of transformation used in
FixedGroupsPipeline
.
Note: For internal use.
rearrr::Transformation
-> FixedGroupsTransformation
name
Name of transformation.
fn
Transformation function.
args
list
of constant arguments for `fn`
.
var_args
list
of arguments for `fn`
with different values per group.
num_groups
Number of groups that the transformation expects.
apply_arg
list
of TRUE
/FALSE
flags indicating
whether the transformation should be applied to each of the groups.
When `NULL`
, the transformation is applied to all groups.
new()
Initialize transformation.
FixedGroupsTransformation$new(fn, args, var_args, name = NULL)
fn
Transformation function.
args
list
of constant arguments for `fn`
.
var_args
list
of arguments for `fn`
with different values per group.
Each argument should have a list of values (one per group).
By adding ".apply"
with a list of TRUE
/FALSE
flags, the transformation
can be disabled for a specific group.
E.g. list(".apply" = list(TRUE, FALSE, TRUE), ...
.
name
Name of transformation.
get_group_args()
Get arguments for specific group ID.
FixedGroupsTransformation$get_group_args(group_id)
group_id
ID of the group to get arguments for.
list
of arguments.
apply()
Apply the transformation to a data.frame
.
FixedGroupsTransformation$apply(data)
data
data.frame
with the expected number of groups.
Transformed version of `data`
.
print()
Print an overview of the transformation.
FixedGroupsTransformation$print(..., indent = 0, show_class = TRUE)
...
further arguments passed to or from other methods.
indent
How many spaces to indent when printing.
show_class
Whether to print the transformation class name.
The pipeline. To allow chaining of methods.
clone()
The objects of this class are cloneable with this method.
FixedGroupsTransformation$clone(deep = FALSE)
deep
Whether to make a deep clone.
Ludvig Renbo Olsen, [email protected]
Other transformation classes:
GeneratedTransformation
,
Transformation
The values are flipped with the formula where x is the value and c is
the origin coordinate to flip the values around.
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and flipping around e.g. the centroid
of each group. By default the median value in each dimension is used.
The *_vec()
version take and return a vector.
Example:
The column values:
c(5, 2, 7, 4, 3, 1)
and the origin_fn = create_origin_fn(median)
Changes the values to :
c(2, 5, 0, 3, 4, 6)
flip_values( data, cols = NULL, origin = NULL, origin_fn = create_origin_fn(median), suffix = "_flipped", keep_original = TRUE, origin_col_name = ".origin", overwrite = FALSE ) flip_values_vec(data, origin = NULL, origin_fn = create_origin_fn(median))
flip_values( data, cols = NULL, origin = NULL, origin_fn = create_origin_fn(median), suffix = "_flipped", keep_original = TRUE, origin_col_name = ".origin", overwrite = FALSE ) flip_values_vec(data, origin = NULL, origin_fn = create_origin_fn(median))
data |
|
cols |
Names of columns in |
origin |
Coordinates of the origin to flip around.
A scalar to use in all dimensions (columns)
or a N.B. Ignored when |
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
origin_col_name |
Name of new column with the origin coordinates. If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
Ludvig Renbo Olsen, [email protected]
Other mutate functions:
apply_transformation_matrix()
,
cluster_groups()
,
dim_values()
,
expand_distances()
,
expand_distances_each()
,
roll_values()
,
rotate_2d()
,
rotate_3d()
,
shear_2d()
,
shear_3d()
,
swirl_2d()
,
swirl_3d()
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "Index" = 1:10, "A" = sample(1:10), "B" = runif(10), "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Flip values of the columns flip_values(df$A) flip_values(df, cols = "A") flip_values(df, cols = "B", origin = 0.3, origin_fn = NULL, keep_original = FALSE) flip_values(df, cols = c("A", "B"), origin = c(3, 0.3), origin_fn = NULL, suffix = "", keep_original = FALSE, overwrite = TRUE ) flip_values(df, cols = c("A", "B"), origin_fn = create_origin_fn(max)) # Grouped by G df %>% dplyr::group_by(G) %>% flip_values( cols = c("A", "B"), origin_fn = create_origin_fn(median), keep_original = FALSE ) # Plot A and flipped A # First flip A around the median and then around the value 3. df <- df %>% flip_values(cols = "A", suffix = "_flip_median", origin_col_name = NULL) %>% flip_values(cols = "A", suffix = "_flip_3", origin = 3, origin_fn = NULL, origin_col_name = NULL) # Plot A and A flipped around its median if (has_ggplot){ ggplot(df, aes(x = Index, y = A)) + geom_line(aes(color = "A")) + geom_line(aes(y = A_flip_median, color = "Flipped A (median)")) + geom_hline(aes(color = "Median A", yintercept = median(A))) + theme_minimal() } # Plot A and A flipped around the value 3 if (has_ggplot){ ggplot(df, aes(x = Index, y = A)) + geom_line(aes(color = "A")) + geom_line(aes(y = A_flip_3, color = "Flipped A (3)")) + geom_hline(aes(color = "3", yintercept = 3)) + theme_minimal() }
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "Index" = 1:10, "A" = sample(1:10), "B" = runif(10), "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Flip values of the columns flip_values(df$A) flip_values(df, cols = "A") flip_values(df, cols = "B", origin = 0.3, origin_fn = NULL, keep_original = FALSE) flip_values(df, cols = c("A", "B"), origin = c(3, 0.3), origin_fn = NULL, suffix = "", keep_original = FALSE, overwrite = TRUE ) flip_values(df, cols = c("A", "B"), origin_fn = create_origin_fn(max)) # Grouped by G df %>% dplyr::group_by(G) %>% flip_values( cols = c("A", "B"), origin_fn = create_origin_fn(median), keep_original = FALSE ) # Plot A and flipped A # First flip A around the median and then around the value 3. df <- df %>% flip_values(cols = "A", suffix = "_flip_median", origin_col_name = NULL) %>% flip_values(cols = "A", suffix = "_flip_3", origin = 3, origin_fn = NULL, origin_col_name = NULL) # Plot A and A flipped around its median if (has_ggplot){ ggplot(df, aes(x = Index, y = A)) + geom_line(aes(color = "A")) + geom_line(aes(y = A_flip_median, color = "Flipped A (median)")) + geom_hline(aes(color = "Median A", yintercept = median(A))) + theme_minimal() } # Plot A and A flipped around the value 3 if (has_ggplot){ ggplot(df, aes(x = Index, y = A)) + geom_line(aes(color = "A")) + geom_line(aes(y = A_flip_3, color = "Flipped A (3)")) + geom_hline(aes(color = "3", yintercept = 3)) + theme_minimal() }
Values are ordered by how far they are from the origin.
In 1d (when
`cols`
has length 1
), the origin can be thought of as a target value.
In n dimensions, the origin can be thought of as coordinates.
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and ordering the rows by
their distance to the centroid of each group.
The *_vec()
version takes and returns a vector
.
Example:
The column values:
c(1, 2, 3, 4, 5)
and origin = 2
are ordered as:
c(5, 4, 1, 3,
2
)
furthest_from( data, cols = NULL, origin = NULL, origin_fn = NULL, shuffle_ties = FALSE, origin_col_name = ".origin", distance_col_name = ".distance", overwrite = FALSE ) furthest_from_vec(data, origin = NULL, origin_fn = NULL, shuffle_ties = FALSE)
furthest_from( data, cols = NULL, origin = NULL, origin_fn = NULL, shuffle_ties = FALSE, origin_col_name = ".origin", distance_col_name = ".distance", overwrite = FALSE ) furthest_from_vec(data, origin = NULL, origin_fn = NULL, shuffle_ties = FALSE)
data |
|
cols |
Column(s) to create sorting factor by.
When |
origin |
Coordinates of the origin to calculate distances to.
A scalar to use in all dimensions
or a N.B. Ignored when |
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
shuffle_ties |
Whether to shuffle elements with the same distance to the origin. (Logical) |
origin_col_name |
Name of new column with the origin coordinates. If |
distance_col_name |
Name of new column with the distances to the origin. If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
The sorted data.frame
(tibble
) / vector
.
Ludvig Renbo Olsen, [email protected]
Other rearrange functions:
center_max()
,
center_min()
,
closest_to()
,
pair_extremes()
,
position_max()
,
position_min()
,
rev_windows()
,
roll_elements()
,
shuffle_hierarchy()
,
triplet_extremes()
Other distance functions:
closest_to()
,
dim_values()
,
distance()
,
expand_distances()
,
expand_distances_each()
,
swirl_2d()
,
swirl_3d()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Furthest from 3 in a vector furthest_from_vec(1:10, origin = 3) # Furthest from the third row (index of data.frame) furthest_from(df, origin = 3)$index # By each of the columns furthest_from(df, cols = "A", origin = 3)$A furthest_from(df, cols = "A", origin_fn = most_centered)$A furthest_from(df, cols = "B", origin = 0.5)$B furthest_from(df, cols = "B", origin_fn = centroid)$B # Shuffle the elements with the same distance to the origin furthest_from(df, cols = "A", origin_fn = create_origin_fn(median), shuffle_ties = TRUE )$A # Grouped by G df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% furthest_from( cols = "A", origin_fn = create_origin_fn(median) ) # Plot the rearranged values plot( x = 1:10, y = furthest_from(df, cols = "B", origin_fn = create_origin_fn(median) )$B, xlab = "Position", ylab = "B" ) plot( x = 1:10, y = furthest_from(df, cols = "A", origin_fn = create_origin_fn(median), shuffle_ties = TRUE )$A, xlab = "Position", ylab = "A" ) # In multiple dimensions df %>% furthest_from(cols = c("A", "B"), origin_fn = most_centered)
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Furthest from 3 in a vector furthest_from_vec(1:10, origin = 3) # Furthest from the third row (index of data.frame) furthest_from(df, origin = 3)$index # By each of the columns furthest_from(df, cols = "A", origin = 3)$A furthest_from(df, cols = "A", origin_fn = most_centered)$A furthest_from(df, cols = "B", origin = 0.5)$B furthest_from(df, cols = "B", origin_fn = centroid)$B # Shuffle the elements with the same distance to the origin furthest_from(df, cols = "A", origin_fn = create_origin_fn(median), shuffle_ties = TRUE )$A # Grouped by G df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% furthest_from( cols = "A", origin_fn = create_origin_fn(median) ) # Plot the rearranged values plot( x = 1:10, y = furthest_from(df, cols = "B", origin_fn = create_origin_fn(median) )$B, xlab = "Position", ylab = "B" ) plot( x = 1:10, y = furthest_from(df, cols = "A", origin_fn = create_origin_fn(median), shuffle_ties = TRUE )$A, xlab = "Position", ylab = "A" ) # In multiple dimensions df %>% furthest_from(cols = c("A", "B"), origin_fn = most_centered)
Generates data.frame
(tibble
) with clustered groups.
generate_clusters( num_rows, num_cols, num_clusters, compactness = 1.6, generator = runif, name_prefix = "D", cluster_col_name = ".cluster" )
generate_clusters( num_rows, num_cols, num_clusters, compactness = 1.6, generator = runif, name_prefix = "D", cluster_col_name = ".cluster" )
num_rows |
Number of rows. |
num_cols |
Number of columns (dimensions). |
num_clusters |
Number of clusters. |
compactness |
How compact the clusters should be. A larger value leads to more compact clusters (on average). Technically, it is passed to the |
generator |
Function to generate the numeric values. Must have the number of values to generate as its first (and only required) argument, as that is the only argument we pass to it. |
name_prefix |
Prefix string for naming columns. |
cluster_col_name |
Name of cluster factor. |
Generates data.frame
with random values using the `generator`
.
Divides the rows into groups (the clusters).
Contracts the distance from each data point to the centroid of its group.
Performs MinMax scaling such that the scale of the data points is similar to the generated data.
data.frame
(tibble
) with the clustered columns and the cluster grouping factor.
Ludvig Renbo Olsen, [email protected]
Other clustering functions:
cluster_groups()
,
transfer_centroids()
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(10) # Generate clusters generate_clusters(num_rows = 20, num_cols = 3, num_clusters = 3, compactness = 1.6) generate_clusters(num_rows = 20, num_cols = 5, num_clusters = 6, compactness = 2.5) # Generate clusters and plot them # Tip: Call this multiple times # to see the behavior of `generate_clusters()` if (has_ggplot){ generate_clusters( num_rows = 50, num_cols = 2, num_clusters = 5, compactness = 1.6 ) %>% ggplot( aes(x = D1, y = D2, color = .cluster) ) + geom_point() + theme_minimal() + labs(x = "D1", y = "D2", color = "Cluster") } # # Plot clusters in 3d view # # Generate clusters clusters <- generate_clusters( num_rows = 50, num_cols = 3, num_clusters = 5, compactness = 1.6 ) ## Not run: # Plot 3d with plotly plotly::plot_ly( x = clusters$D1, y = clusters$D2, z = clusters$D3, type = "scatter3d", mode = "markers", color = clusters$.cluster ) ## End(Not run)
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(10) # Generate clusters generate_clusters(num_rows = 20, num_cols = 3, num_clusters = 3, compactness = 1.6) generate_clusters(num_rows = 20, num_cols = 5, num_clusters = 6, compactness = 2.5) # Generate clusters and plot them # Tip: Call this multiple times # to see the behavior of `generate_clusters()` if (has_ggplot){ generate_clusters( num_rows = 50, num_cols = 2, num_clusters = 5, compactness = 1.6 ) %>% ggplot( aes(x = D1, y = D2, color = .cluster) ) + geom_point() + theme_minimal() + labs(x = "D1", y = "D2", color = "Cluster") } # # Plot clusters in 3d view # # Generate clusters clusters <- generate_clusters( num_rows = 50, num_cols = 3, num_clusters = 5, compactness = 1.6 ) ## Not run: # Plot 3d with plotly plotly::plot_ly( x = clusters$D1, y = clusters$D2, z = clusters$D3, type = "scatter3d", mode = "markers", color = clusters$.cluster ) ## End(Not run)
Build a pipeline of transformations to be applied sequentially.
Generate argument values for selected arguments with a given set of generators.
E.g. randomly generate argument values for each group in a data.frame
.
Groupings are reset between each transformation. See group_cols
.
Standard workflow: Instantiate pipeline -> Add transformations -> Apply to data
To apply the same arguments to all groups, see
Pipeline
.
To apply different but specified argument values to a fixed set of groups,
see FixedGroupsPipeline
.
rearrr::Pipeline
-> GeneratedPipeline
transformations
list
of transformations to apply.
names
Names of the transformations.
add_transformation()
Add a transformation to the pipeline.
GeneratedPipeline$add_transformation( fn, args, generators, name, group_cols = NULL )
fn
Function that performs the transformation.
args
Named list
with arguments for the `fn`
function.
generators
Named list
of functions for generating argument values
for a single call of `fn`
.
It is possible to include an apply generator for deciding whether
the transformation should be applied to the current group or not.
This is done by adding a function with the name `.apply`
to the `generators`
list.
E.g. ".apply" = function(){sample(c(TRUE, FALSE), 1)}
.
name
Name of the transformation step. Must be unique.
group_cols
Names of the columns to group the input data by before applying the transformation.
Note that the transformation function is applied separately to each group (subset).
If the `fn`
function requires access to the entire data.frame
, the
grouping columns should be specified as part of `args`
and
handled by the `fn`
function.
The pipeline. To allow chaining of methods.
# `generators` is a list of functions for generating # argument values for a chosen set of arguments # `.apply` can be used to disable the transformation generators = list(degrees = function(){sample.int(360, 1)}, origin = function(){rnorm(2)}, .apply = function(){sample(c(TRUE, FALSE), 1)})
print()
Print an overview of the pipeline.
GeneratedPipeline$print(...)
...
further arguments passed to or from other methods.
The pipeline. To allow chaining of methods.
clone()
The objects of this class are cloneable with this method.
GeneratedPipeline$clone(deep = FALSE)
deep
Whether to make a deep clone.
Ludvig Renbo Olsen, [email protected]
Other pipelines:
FixedGroupsPipeline
,
Pipeline
# Attach package library(rearrr) # Create a data frame df <- data.frame( "Index" = 1:12, "A" = c(1:4, 9:12, 15:18), "G" = rep(1:3, each = 4) ) # Create new pipeline pipe <- GeneratedPipeline$new() # Add 2D rotation transformation # Note that we specify the grouping via `group_cols` pipe$add_transformation( fn = rotate_2d, args = list( x_col = "Index", y_col = "A", suffix = "", overwrite = TRUE ), generators = list(degrees = function(){sample.int(360, 1)}, origin = function(){rnorm(2)}), name = "Rotate", group_cols = "G" ) # Add the `cluster_group` transformation # Note that this function requires the entire input data # to properly scale the groups. We therefore specify `group_cols` # as part of `args`. This works as `cluster_groups()` accepts that # argument. # Also note the `.apply` generator which generates a TRUE/FALSE scalar # for whether the transformation should be applied to the current group pipe$add_transformation( fn = cluster_groups, args = list( cols = c("Index", "A"), suffix = "", overwrite = TRUE, group_cols = "G" ), generators = list( multiplier = function() { 0.1 * runif(1) * 3 ^ sample.int(5, 1) }, .apply = function(){sample(c(TRUE, FALSE), 1)} ), name = "Cluster" ) # Check pipeline object pipe # Apply pipeline to data.frame # Enable `verbose` to print progress pipe$apply(df, verbose = TRUE) ## ------------------------------------------------ ## Method `GeneratedPipeline$add_transformation` ## ------------------------------------------------ # `generators` is a list of functions for generating # argument values for a chosen set of arguments # `.apply` can be used to disable the transformation generators = list(degrees = function(){sample.int(360, 1)}, origin = function(){rnorm(2)}, .apply = function(){sample(c(TRUE, FALSE), 1)})
# Attach package library(rearrr) # Create a data frame df <- data.frame( "Index" = 1:12, "A" = c(1:4, 9:12, 15:18), "G" = rep(1:3, each = 4) ) # Create new pipeline pipe <- GeneratedPipeline$new() # Add 2D rotation transformation # Note that we specify the grouping via `group_cols` pipe$add_transformation( fn = rotate_2d, args = list( x_col = "Index", y_col = "A", suffix = "", overwrite = TRUE ), generators = list(degrees = function(){sample.int(360, 1)}, origin = function(){rnorm(2)}), name = "Rotate", group_cols = "G" ) # Add the `cluster_group` transformation # Note that this function requires the entire input data # to properly scale the groups. We therefore specify `group_cols` # as part of `args`. This works as `cluster_groups()` accepts that # argument. # Also note the `.apply` generator which generates a TRUE/FALSE scalar # for whether the transformation should be applied to the current group pipe$add_transformation( fn = cluster_groups, args = list( cols = c("Index", "A"), suffix = "", overwrite = TRUE, group_cols = "G" ), generators = list( multiplier = function() { 0.1 * runif(1) * 3 ^ sample.int(5, 1) }, .apply = function(){sample(c(TRUE, FALSE), 1)} ), name = "Cluster" ) # Check pipeline object pipe # Apply pipeline to data.frame # Enable `verbose` to print progress pipe$apply(df, verbose = TRUE) ## ------------------------------------------------ ## Method `GeneratedPipeline$add_transformation` ## ------------------------------------------------ # `generators` is a list of functions for generating # argument values for a chosen set of arguments # `.apply` can be used to disable the transformation generators = list(degrees = function(){sample.int(360, 1)}, origin = function(){rnorm(2)}, .apply = function(){sample(c(TRUE, FALSE), 1)})
Container for the type of transformation used in
GeneratedPipeline
.
Note: For internal use.
rearrr::Transformation
-> GeneratedTransformation
name
Name of transformation.
fn
Transformation function.
args
list
of constant arguments for `fn`
.
generators
list
of generator functions
for generating argument values.
apply_generator
Generator function for deciding whether to apply the transformation to the current group.
new()
Initialize transformation.
GeneratedTransformation$new( fn, args, generators, name = NULL, group_cols = NULL )
fn
Transformation function.
args
list
of constant arguments for `fn`
.
generators
Named list
of functions for generating argument values
for a single call of `fn`
.
It is possible to include an apply generator for deciding whether
the transformation should be applied to the current group or not.
This is done by adding a function with the name `.apply`
to the `generators`
list.
E.g. ".apply" = function(){sample(c(TRUE, FALSE), 1)}
.
name
Name of transformation.
group_cols
Names of columns to group data.frame
by before applying `fn`
.
When `NULL`
, the data.frame
is not grouped.
get_group_args()
Get arguments for a group.
GeneratedTransformation$get_group_args()
list
of arguments (both constant and generated).
generate_args()
Generate arguments for a group with the `generators`
.
GeneratedTransformation$generate_args()
list
of generated arguments.
Does not include the constant arguments.
print()
Print an overview of the transformation.
GeneratedTransformation$print(..., indent = 0, show_class = TRUE)
...
further arguments passed to or from other methods.
indent
How many spaces to indent when printing.
show_class
Whether to print the transformation class name.
The pipeline. To allow chaining of methods.
clone()
The objects of this class are cloneable with this method.
GeneratedTransformation$clone(deep = FALSE)
deep
Whether to make a deep clone.
Ludvig Renbo Olsen, [email protected]
Other transformation classes:
FixedGroupsTransformation
,
Transformation
Create the x-coordinates for a vector
of y-coordinates such that
they form a hexagon.
This will likely look most like a hexagon when the y-coordinates are somewhat equally distributed, e.g. a uniform distribution.
hexagonalize( data, y_col = NULL, .min = NULL, .max = NULL, offset_x = 0, keep_original = TRUE, x_col_name = ".hexagon_x", edge_col_name = ".edge", overwrite = FALSE )
hexagonalize( data, y_col = NULL, .min = NULL, .max = NULL, offset_x = 0, keep_original = TRUE, x_col_name = ".hexagon_x", edge_col_name = ".edge", overwrite = FALSE )
data |
|
y_col |
Name of column in |
.min |
Minimum y-coordinate. If |
.max |
Maximum y-coordinate. If |
offset_x |
Value to offset the x-coordinates by. |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
x_col_name |
Name of new column with the x-coordinates. |
edge_col_name |
Name of new column with the edge identifiers. If Numbering is clockwise and starts at the upper-right edge. |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
data.frame
(tibble
) with the added x-coordinates and an identifier
for the edge the data point is a part of.
Ludvig Renbo Olsen, [email protected]
Other forming functions:
circularize()
,
square()
,
triangularize()
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "y" = runif(200), "g" = factor(rep(1:5, each = 40)) ) # Hexagonalize 'y' df_hex <- hexagonalize(df, y_col = "y") df_hex # Plot hexagon if (has_ggplot){ df_hex %>% ggplot(aes(x = .hexagon_x, y = y, color = .edge)) + geom_point() + theme_minimal() } # # Grouped hexagonalization # # Hexagonalize 'y' for each group # First cluster the groups a bit to move the # hexagons away from each other df_hex <- df %>% cluster_groups( cols = "y", group_cols = "g", suffix = "", overwrite = TRUE ) %>% dplyr::group_by(g) %>% hexagonalize( y_col = "y", overwrite = TRUE ) # Plot hexagons if (has_ggplot){ df_hex %>% ggplot(aes(x = .hexagon_x, y = y, color = g)) + geom_point() + theme_minimal() } # # Specifying minimum value # # Specify minimum value manually df_hex <- hexagonalize(df, y_col = "y", .min = -2) df_hex # Plot hexagon if (has_ggplot){ df_hex %>% ggplot(aes(x = .hexagon_x, y = y, color = .edge)) + geom_point() + theme_minimal() } # # Multiple hexagons by contraction # # Start by hexagonalizing 'y' df_hex <- hexagonalize(df, y_col = "y") # Contract '.hexagon_x' and 'y' towards the centroid # To contract with multiple multipliers at once, # we wrap the call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = c(1, 0.75, 0.5, 0.25, 0.125), .f = function(mult) { expand_distances( data = df_hex, cols = c(".hexagon_x", "y"), multiplier = mult, origin_fn = centroid, overwrite = TRUE ) } ) df_expanded if (has_ggplot){ df_expanded %>% ggplot(aes( x = .hexagon_x_expanded, y = y_expanded, color = .edge, alpha = .multiplier )) + geom_point() + theme_minimal() }
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "y" = runif(200), "g" = factor(rep(1:5, each = 40)) ) # Hexagonalize 'y' df_hex <- hexagonalize(df, y_col = "y") df_hex # Plot hexagon if (has_ggplot){ df_hex %>% ggplot(aes(x = .hexagon_x, y = y, color = .edge)) + geom_point() + theme_minimal() } # # Grouped hexagonalization # # Hexagonalize 'y' for each group # First cluster the groups a bit to move the # hexagons away from each other df_hex <- df %>% cluster_groups( cols = "y", group_cols = "g", suffix = "", overwrite = TRUE ) %>% dplyr::group_by(g) %>% hexagonalize( y_col = "y", overwrite = TRUE ) # Plot hexagons if (has_ggplot){ df_hex %>% ggplot(aes(x = .hexagon_x, y = y, color = g)) + geom_point() + theme_minimal() } # # Specifying minimum value # # Specify minimum value manually df_hex <- hexagonalize(df, y_col = "y", .min = -2) df_hex # Plot hexagon if (has_ggplot){ df_hex %>% ggplot(aes(x = .hexagon_x, y = y, color = .edge)) + geom_point() + theme_minimal() } # # Multiple hexagons by contraction # # Start by hexagonalizing 'y' df_hex <- hexagonalize(df, y_col = "y") # Contract '.hexagon_x' and 'y' towards the centroid # To contract with multiple multipliers at once, # we wrap the call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = c(1, 0.75, 0.5, 0.25, 0.125), .f = function(mult) { expand_distances( data = df_hex, cols = c(".hexagon_x", "y"), multiplier = mult, origin_fn = centroid, overwrite = TRUE ) } ) df_expanded if (has_ggplot){ df_expanded %>% ggplot(aes( x = .hexagon_x_expanded, y = y_expanded, color = .edge, alpha = .multiplier )) + geom_point() + theme_minimal() }
Finds the data point with the shortest distance to the centroid.
To get the coordinates of the most centered data point,
use most_centered()
instead.
is_most_centered(..., na.rm = FALSE)
is_most_centered(..., na.rm = FALSE)
... |
|
na.rm |
Whether to ignore missing values. At least one data point must be complete. (Logical) |
Logical vector (TRUE
/FALSE
) indicating if a data point is the most centered.
Ludvig Renbo Olsen, [email protected]
Other coordinate functions:
centroid()
,
create_origin_fn()
,
midrange()
,
most_centered()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(10) z <- runif(10) # Find the data point closest to the centroid is_most_centered(x, y, z) # Compare to coordinates for the most centered most_centered(x, y, z) # # For data.frames # # Create data frame df <- data.frame( "x" = x, "y" = y, "z" = z, "g" = rep(1:2, each = 5) ) # Filter the data points # closest to the centroid df %>% dplyr::filter(is_most_centered(x, y, z)) # When 'df' is grouped df %>% dplyr::group_by(g) %>% dplyr::filter(is_most_centered(x, y, z)) # Add as column df %>% dplyr::group_by(g) %>% dplyr::mutate(mc = is_most_centered(x, y, z))
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(10) z <- runif(10) # Find the data point closest to the centroid is_most_centered(x, y, z) # Compare to coordinates for the most centered most_centered(x, y, z) # # For data.frames # # Create data frame df <- data.frame( "x" = x, "y" = y, "z" = z, "g" = rep(1:2, each = 5) ) # Filter the data points # closest to the centroid df %>% dplyr::filter(is_most_centered(x, y, z)) # When 'df' is grouped df %>% dplyr::group_by(g) %>% dplyr::filter(is_most_centered(x, y, z)) # Add as column df %>% dplyr::group_by(g) %>% dplyr::mutate(mc = is_most_centered(x, y, z))
Applies function to the indices of each vector
in `...`
.
These functions were created with create_n_fn()
.
median_index(..., negate = FALSE, round_fn = round) quantile_index(..., prob, type = 7, negate = FALSE, round_fn = round)
median_index(..., negate = FALSE, round_fn = round) quantile_index(..., prob, type = 7, negate = FALSE, round_fn = round)
... |
Numeric |
negate |
Whether to negate the result. I.e. to multiply it with |
round_fn |
Function for rounding output. Rounding is done prior to negation. |
prob |
Probability in |
type |
Quantile algorithm to use. See |
numeric vector
with one element per supplied vector.
Ludvig Renbo Olsen, [email protected]
Other n functions:
create_n_fn()
# Attach packages library(rearrr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(15) z <- runif(20) median_index(x, y, z) quantile_index(x, y, z, prob = 0.2) # Negate result median_index(x, y, z, negate = TRUE)
# Attach packages library(rearrr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(15) z <- runif(20) median_index(x, y, z) quantile_index(x, y, z, prob = 0.2) # Negate result median_index(x, y, z, negate = TRUE)
Calculates the midrange for each of the passed vectors
/columns
.
Midrange is defined as:
midrange(..., cols = NULL, na.rm = FALSE)
midrange(..., cols = NULL, na.rm = FALSE)
... |
|
cols |
Names of columns to use when |
na.rm |
Whether to ignore missing values when calculating
|
Either a vector
with the midrange of each supplied vector
or a data.frame
with the midrange of each supplied column along with any grouping variables.
Ludvig Renbo Olsen, [email protected]
Other coordinate functions:
centroid()
,
create_origin_fn()
,
is_most_centered()
,
most_centered()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(10) z <- runif(10) # Find midrange for each vector midrange(x, y, z) # # For data.frames # # Create data frame df <- data.frame( "x" = x, "y" = y, "z" = z, "g" = rep(1:2, each = 5) ) # Find midrange for each column midrange(df, cols = c("x", "y", "z")) # When 'df' is grouped df %>% dplyr::group_by(g) %>% midrange(cols = c("x", "y", "z"))
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(10) z <- runif(10) # Find midrange for each vector midrange(x, y, z) # # For data.frames # # Create data frame df <- data.frame( "x" = x, "y" = y, "z" = z, "g" = rep(1:2, each = 5) ) # Find midrange for each column midrange(df, cols = c("x", "y", "z")) # When 'df' is grouped df %>% dplyr::group_by(g) %>% midrange(cols = c("x", "y", "z"))
Scales the values to a range with MinMax scaling.
min_max_scale( x, new_min, new_max, old_min = NULL, old_max = NULL, na.rm = FALSE )
min_max_scale( x, new_min, new_max, old_min = NULL, old_max = NULL, na.rm = FALSE )
x |
Numeric |
new_min |
Minimum value of target range. |
new_max |
Maximum value of target range. |
old_min |
Minimum value of original range. If |
old_max |
Maximum value of original range. If |
na.rm |
Whether missing values should be removed when calculating N.B. Ignored when both |
Scaled version of `x`
.
Ludvig Renbo Olsen, [email protected]
Other scaling functions:
to_unit_length()
# Attach packages library(rearrr) # Set seed set.seed(1) # Create numeric vector x <- runif(10) # Scale min_max_scale(x, new_min = -1, new_max = 0) min_max_scale(x, new_min = -1, new_max = 0, old_max = 3)
# Attach packages library(rearrr) # Set seed set.seed(1) # Create numeric vector x <- runif(10) # Scale min_max_scale(x, new_min = -1, new_max = 0) min_max_scale(x, new_min = -1, new_max = 0, old_max = 3)
Returns the coordinates for the data point with the
shortest distance to the centroid
.
To get a logical vector (TRUE
/FALSE
) indicating
whether a data point is the most centered,
use is_most_centered()
.
most_centered(..., cols = NULL, na.rm = FALSE)
most_centered(..., cols = NULL, na.rm = FALSE)
... |
|
cols |
Names of columns to use when |
na.rm |
Whether to ignore missing values. At least one data point must be complete. (Logical) |
The coordinates for the data point closest to the centroid.
Either as a vector
or a data.frame
.
Ludvig Renbo Olsen, [email protected]
Other coordinate functions:
centroid()
,
create_origin_fn()
,
is_most_centered()
,
midrange()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(10) z <- runif(10) # Find coordinates of the data point # closest to the centroid most_centered(x, y, z) # Compare to centroid coordinates centroid(x, y, z) # # For data.frames # # Create data frame df <- data.frame( "x" = x, "y" = y, "z" = z, "g" = rep(1:2, each = 5) ) # Find coordinates of the data point # closest to the centroid most_centered(df, cols = c("x", "y", "z")) # When 'df' is grouped df %>% dplyr::group_by(g) %>% most_centered(cols = c("x", "y", "z")) # Filter to only include most centered data points df %>% dplyr::group_by(g) %>% dplyr::filter(is_most_centered(x, y, z))
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create three vectors x <- runif(10) y <- runif(10) z <- runif(10) # Find coordinates of the data point # closest to the centroid most_centered(x, y, z) # Compare to centroid coordinates centroid(x, y, z) # # For data.frames # # Create data frame df <- data.frame( "x" = x, "y" = y, "z" = z, "g" = rep(1:2, each = 5) ) # Find coordinates of the data point # closest to the centroid most_centered(df, cols = c("x", "y", "z")) # When 'df' is grouped df %>% dplyr::group_by(g) %>% most_centered(cols = c("x", "y", "z")) # Filter to only include most centered data points df %>% dplyr::group_by(g) %>% dplyr::filter(is_most_centered(x, y, z))
The values are paired/grouped such that the lowest and highest values form the first group, the second lowest and the second highest values form the second group, and so on. The values are then sorted by these groups/pairs.
When `data`
has an uneven number of rows, the `unequal_method`
determines which group should have only 1
element.
The *_vec()
version takes and returns a vector
.
Example:
The column values:
c(1, 2, 3, 4, 5, 6)
Creates the sorting factor:
c(1, 2, 3, 3, 2, 1)
And are ordered as:
c(1, 6, 2, 5, 3, 4)
pair_extremes( data, col = NULL, unequal_method = "middle", num_pairings = 1, balance = "mean", order_by_aggregates = FALSE, shuffle_members = FALSE, shuffle_pairs = FALSE, factor_name = ifelse(num_pairings == 1, ".pair", ".pairing"), overwrite = FALSE ) pair_extremes_vec( data, unequal_method = "middle", num_pairings = 1, balance = "mean", order_by_aggregates = FALSE, shuffle_members = FALSE, shuffle_pairs = FALSE )
pair_extremes( data, col = NULL, unequal_method = "middle", num_pairings = 1, balance = "mean", order_by_aggregates = FALSE, shuffle_members = FALSE, shuffle_pairs = FALSE, factor_name = ifelse(num_pairings == 1, ".pair", ".pairing"), overwrite = FALSE ) pair_extremes_vec( data, unequal_method = "middle", num_pairings = 1, balance = "mean", order_by_aggregates = FALSE, shuffle_members = FALSE, shuffle_pairs = FALSE )
data |
|
col |
Column to create sorting factor by.
When |
unequal_method |
Method for dealing with an
unequal number of rows/elements in One of: firstThe first group will have size Example: The ordered column values:
Creates the sorting factor:
And are ordered as:
middleThe middle group will have size Example: The ordered column values:
Creates the sorting factor:
And are ordered as:
lastThe last group will have size Example: The ordered column values:
Creates the sorting factor:
And are ordered as:
|
num_pairings |
Number of pairings to perform (recursively). At least Based on |
balance |
What to balance pairs for in a given secondary pairing.
Either The first pairing always pairs the actual element values. meanPairs have similar means. The values in the pairs from the previous pairing
are aggregated with spreadPairs have similar spread (e.g. standard deviations).
The values in the pairs from the previous pairing
are aggregated with min / maxPairs have similar minimum / maximum values. The values in the pairs from the previous pairing
are aggregated with |
order_by_aggregates |
Whether to order the pairs from initial pairings (first N.B. Only used when |
shuffle_members |
Whether to shuffle the order of the group members within the groups. (Logical) |
shuffle_pairs |
Whether to shuffle the order of the pairs. Pair members remain together. (Logical) |
factor_name |
Name of new column with the sorting factor.
If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
The sorted data.frame
(tibble
) / vector
.
Optionally with the sorting factor added.
When `data`
is a vector
and `keep_factors`
is FALSE
,
the output will be a vector
. Otherwise, a data.frame
.
Ludvig Renbo Olsen, [email protected]
Other rearrange functions:
center_max()
,
center_min()
,
closest_to()
,
furthest_from()
,
position_max()
,
position_min()
,
rev_windows()
,
roll_elements()
,
shuffle_hierarchy()
,
triplet_extremes()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "C" = LETTERS[1:10], "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Pair extreme indices (row numbers) pair_extremes(df) # Pair extremes in each of the columns pair_extremes(df, col = "A")$A pair_extremes(df, col = "B")$B pair_extremes(df, col = "C")$C # Shuffle the members pair-wise # The rows within each pair are shuffled # while the `.pair` column maintains it order pair_extremes(df, col = "A", shuffle_members = TRUE) # Shuffle the order of the pairs # The rows within each pair maintain their order # and stay together but the `.pair` column is shuffled pair_extremes(df, col = "A", shuffle_pairs = TRUE) # Use recursive pairing # Mostly meaningful with much larger datasets # Order initial grouping by pair identifiers pair_extremes(df, col = "A", num_pairings = 2) # Order initial grouping by aggregate values pair_extremes(df, col = "A", num_pairings = 2, order_by_aggregates = TRUE) # Grouped by G # Each G group only has 3 elements # so it only creates 1 pair and a group # with the single excessive element # per G group df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% pair_extremes(col = "A") # Plot the extreme pairs plot( x = 1:10, y = pair_extremes(df, col = "B")$B, col = as.character(rep(1:5, each = 2)) ) # With shuffled pair members (run a few times) plot( x = 1:10, y = pair_extremes(df, col = "B", shuffle_members = TRUE)$B, col = as.character(rep(1:5, each = 2)) ) # With shuffled pairs (run a few times) plot( x = rep(1:5, each = 2), y = pair_extremes(df, col = "B", shuffle_pairs = TRUE)$B, col = as.character(rep(1:5, each = 2)) )
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "C" = LETTERS[1:10], "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Pair extreme indices (row numbers) pair_extremes(df) # Pair extremes in each of the columns pair_extremes(df, col = "A")$A pair_extremes(df, col = "B")$B pair_extremes(df, col = "C")$C # Shuffle the members pair-wise # The rows within each pair are shuffled # while the `.pair` column maintains it order pair_extremes(df, col = "A", shuffle_members = TRUE) # Shuffle the order of the pairs # The rows within each pair maintain their order # and stay together but the `.pair` column is shuffled pair_extremes(df, col = "A", shuffle_pairs = TRUE) # Use recursive pairing # Mostly meaningful with much larger datasets # Order initial grouping by pair identifiers pair_extremes(df, col = "A", num_pairings = 2) # Order initial grouping by aggregate values pair_extremes(df, col = "A", num_pairings = 2, order_by_aggregates = TRUE) # Grouped by G # Each G group only has 3 elements # so it only creates 1 pair and a group # with the single excessive element # per G group df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% pair_extremes(col = "A") # Plot the extreme pairs plot( x = 1:10, y = pair_extremes(df, col = "B")$B, col = as.character(rep(1:5, each = 2)) ) # With shuffled pair members (run a few times) plot( x = 1:10, y = pair_extremes(df, col = "B", shuffle_members = TRUE)$B, col = as.character(rep(1:5, each = 2)) ) # With shuffled pairs (run a few times) plot( x = rep(1:5, each = 2), y = pair_extremes(df, col = "B", shuffle_pairs = TRUE)$B, col = as.character(rep(1:5, each = 2)) )
Build a pipeline of transformations to be applied sequentially.
Uses the same arguments for all groups in `data`
.
Groupings are reset between each transformation. See group_cols
.
Standard workflow: Instantiate pipeline -> Add transformations -> Apply to data
To apply different argument values to each group, see
GeneratedPipeline
for generating
argument values for an arbitrary number of groups and
FixedGroupsPipeline
for specifying
specific values for a fixed set of groups.
transformations
list
of transformations to apply.
names
Names of the transformations.
add_transformation()
Add a transformation to the pipeline.
Pipeline$add_transformation(fn, args, name, group_cols = NULL)
fn
Function that performs the transformation.
args
Named list
with arguments for the `fn`
function.
name
Name of the transformation step. Must be unique.
group_cols
Names of the columns to group the input data by before applying the transformation.
Note that the transformation function is applied separately to each group (subset).
If the `fn`
function requires access to the entire data.frame
, the
grouping columns should be specified as part of `args`
and
handled by the `fn`
function.
The pipeline. To allow chaining of methods.
apply()
Apply the pipeline to a data.frame
.
Pipeline$apply(data, verbose = FALSE)
data
data.frame
.
A grouped data.frame
will raise a warning and the grouping will be ignored.
Use the `group_cols`
argument in the `add_transformation`
method to
specify how `data`
should be grouped for each transformation.
verbose
Whether to print the progress.
Transformed version of `data`
.
print()
Print an overview of the pipeline.
Pipeline$print(...)
...
further arguments passed to or from other methods.
The pipeline. To allow chaining of methods.
clone()
The objects of this class are cloneable with this method.
Pipeline$clone(deep = FALSE)
deep
Whether to make a deep clone.
Ludvig Renbo Olsen, [email protected]
Other pipelines:
FixedGroupsPipeline
,
GeneratedPipeline
# Attach package library(rearrr) # Create a data frame df <- data.frame( "Index" = 1:12, "A" = c(1:4, 9:12, 15:18), "G" = rep(1:3, each = 4) ) # Create new pipeline pipe <- Pipeline$new() # Add 2D rotation transformation # Note that we specify the grouping via `group_cols` pipe$add_transformation( fn = rotate_2d, args = list( x_col = "Index", y_col = "A", origin = c(0, 0), degrees = 45, suffix = "", overwrite = TRUE ), name = "Rotate", group_cols = "G" ) # Add the `cluster_group` transformation # Note that this function requires the entire input data # to properly scale the groups. We therefore specify `group_cols` # as part of `args`. This works as `cluster_groups()` accepts that # argument. pipe$add_transformation( fn = cluster_groups, args = list( cols = c("Index", "A"), suffix = "", overwrite = TRUE, multiplier = 0.05, group_cols = "G" ), name = "Cluster" ) # Check pipeline object pipe # Apply pipeline to data.frame # Enable `verbose` to print progress pipe$apply(df, verbose = TRUE)
# Attach package library(rearrr) # Create a data frame df <- data.frame( "Index" = 1:12, "A" = c(1:4, 9:12, 15:18), "G" = rep(1:3, each = 4) ) # Create new pipeline pipe <- Pipeline$new() # Add 2D rotation transformation # Note that we specify the grouping via `group_cols` pipe$add_transformation( fn = rotate_2d, args = list( x_col = "Index", y_col = "A", origin = c(0, 0), degrees = 45, suffix = "", overwrite = TRUE ), name = "Rotate", group_cols = "G" ) # Add the `cluster_group` transformation # Note that this function requires the entire input data # to properly scale the groups. We therefore specify `group_cols` # as part of `args`. This works as `cluster_groups()` accepts that # argument. pipe$add_transformation( fn = cluster_groups, args = list( cols = c("Index", "A"), suffix = "", overwrite = TRUE, multiplier = 0.05, group_cols = "G" ), name = "Cluster" ) # Check pipeline object pipe # Apply pipeline to data.frame # Enable `verbose` to print progress pipe$apply(df, verbose = TRUE)
The highest value is positioned at the given index/quantile with the other values decreasing around it.
Example:
The column values:
c(1, 2, 3, 4,
5
)
and position = 2
are ordered as:
c(3,
5
, 4, 2, 1)
position_max(data, col = NULL, position = NULL, shuffle_sides = FALSE)
position_max(data, col = NULL, position = NULL, shuffle_sides = FALSE)
data |
|
col |
Column to create sorting factor by.
When |
position |
Index or quantile (in |
shuffle_sides |
Whether to shuffle which elements are left and right of the position. (Logical) |
The sorted data.frame
(tibble
) / vector
.
Ludvig Renbo Olsen, [email protected]
Other rearrange functions:
center_max()
,
center_min()
,
closest_to()
,
furthest_from()
,
pair_extremes()
,
position_min()
,
rev_windows()
,
roll_elements()
,
shuffle_hierarchy()
,
triplet_extremes()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "C" = LETTERS[1:10], "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Position the highest index (row number) position_max(df, position = 3)$index position_max(df, position = 8)$index # Position the maximum value in each of the columns position_max(df, col = "A", position = 3)$A position_max(df, col = "B", position = 3)$B position_max(df, col = "C", position = 3)$C # Randomize which elements are left and right of the position position_max(df, col = "A", position = 3, shuffle_sides = TRUE)$A # Grouped by G df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% position_max(col = "A", position = 2) # Plot the rearranged values plot(x = 1:10, y = position_max(df, col = "B", position = 3)$B) plot(x = 1:10, y = position_max(df, col = "B", position = 3, shuffle_sides = TRUE)$B)
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "C" = LETTERS[1:10], "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Position the highest index (row number) position_max(df, position = 3)$index position_max(df, position = 8)$index # Position the maximum value in each of the columns position_max(df, col = "A", position = 3)$A position_max(df, col = "B", position = 3)$B position_max(df, col = "C", position = 3)$C # Randomize which elements are left and right of the position position_max(df, col = "A", position = 3, shuffle_sides = TRUE)$A # Grouped by G df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% position_max(col = "A", position = 2) # Plot the rearranged values plot(x = 1:10, y = position_max(df, col = "B", position = 3)$B) plot(x = 1:10, y = position_max(df, col = "B", position = 3, shuffle_sides = TRUE)$B)
The lowest value is positioned at the given index/quantile with the other values increasing around it.
Example:
The column values:
c(
1
, 2, 3, 4, 5)
and position = 2
are ordered as:
c(3,
1
, 2, 4, 5)
position_min(data, col = NULL, position = NULL, shuffle_sides = FALSE)
position_min(data, col = NULL, position = NULL, shuffle_sides = FALSE)
data |
|
col |
Column to create sorting factor by.
When |
position |
Index or quantile (in |
shuffle_sides |
Whether to shuffle which elements are left and right of the position. (Logical) |
The sorted data.frame
(tibble
) / vector
.
Ludvig Renbo Olsen, [email protected]
Other rearrange functions:
center_max()
,
center_min()
,
closest_to()
,
furthest_from()
,
pair_extremes()
,
position_max()
,
rev_windows()
,
roll_elements()
,
shuffle_hierarchy()
,
triplet_extremes()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "C" = LETTERS[1:10], "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Position the smallest index (row number) position_min(df, position = 3)$index position_min(df, position = 8)$index # Position the minimum value in each of the columns position_min(df, col = "A", position = 3)$A position_min(df, col = "B", position = 3)$B position_min(df, col = "C", position = 3)$C # Randomize which elements are left and right of the position position_min(df, col = "A", position = 3, shuffle_sides = TRUE)$A # Grouped by G df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% position_min(col = "A", position = 2) # Plot the rearranged values plot(x = 1:10, y = position_min(df, col = "B", position = 3)$B) plot(x = 1:10, y = position_min(df, col = "B", position = 3, shuffle_sides = TRUE)$B)
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "C" = LETTERS[1:10], "G" = c( 1, 1, 1, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Position the smallest index (row number) position_min(df, position = 3)$index position_min(df, position = 8)$index # Position the minimum value in each of the columns position_min(df, col = "A", position = 3)$A position_min(df, col = "B", position = 3)$B position_min(df, col = "C", position = 3)$C # Randomize which elements are left and right of the position position_min(df, col = "A", position = 3, shuffle_sides = TRUE)$A # Grouped by G df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% position_min(col = "A", position = 2) # Plot the rearranged values plot(x = 1:10, y = position_min(df, col = "B", position = 3)$B) plot(x = 1:10, y = position_min(df, col = "B", position = 3, shuffle_sides = TRUE)$B)
The values are windowed and reversed within windows.
The *_vec()
version takes and returns a vector
.
Example:
The column values:
c(1, 2, 3, 4, 5, 6)
With window_size = 3
Are ordered as:
c(3, 2, 1, 6, 4, 5)
rev_windows(data, window_size, factor_name = ".window", overwrite = FALSE) rev_windows_vec(data, window_size)
rev_windows(data, window_size, factor_name = ".window", overwrite = FALSE) rev_windows_vec(data, window_size)
data |
|
window_size |
Size of the windows. (Logical) |
factor_name |
Name of the factor with window identifiers.
If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
The sorted data.frame
(tibble
) / vector
.
Optionally with the windows factor added.
When `data`
is a vector
and `keep_windows`
is FALSE
,
the output will be a vector
. Otherwise, a data.frame
.
Ludvig Renbo Olsen, [email protected]
Other rearrange functions:
center_max()
,
center_min()
,
closest_to()
,
furthest_from()
,
pair_extremes()
,
position_max()
,
position_min()
,
roll_elements()
,
shuffle_hierarchy()
,
triplet_extremes()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "C" = LETTERS[1:10], "G" = rep(1:2, each = 5), stringsAsFactors = FALSE ) # For vector rev_windows_vec(1:10, window_size = 3) # For data frame rev_windows(df, window_size = 3) rev_windows(df, window_size = 3, factor_name = NULL) # Grouped by G df %>% dplyr::select(G, index) %>% # For clarity dplyr::group_by(G) %>% rev_windows(window_size = 3) # Plot the extreme pairs plot( x = 1:10, y = rev_windows_vec(1:10, window_size = 3) )
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:10, "A" = sample(1:10), "B" = runif(10), "C" = LETTERS[1:10], "G" = rep(1:2, each = 5), stringsAsFactors = FALSE ) # For vector rev_windows_vec(1:10, window_size = 3) # For data frame rev_windows(df, window_size = 3) rev_windows(df, window_size = 3, factor_name = NULL) # Grouped by G df %>% dplyr::select(G, index) %>% # For clarity dplyr::group_by(G) %>% rev_windows(window_size = 3) # Plot the extreme pairs plot( x = 1:10, y = rev_windows_vec(1:10, window_size = 3) )
Rolls positions of elements.
Example:
Rolling c(1, 2, 3, 4, 5)
with `n = 2`
becomes:
c(3, 4, 5, 1, 2)
roll_elements_vec()
takes and returns a vector
.
Should not be confused with roll_values()
,
which changes the values of the elements and wraps to a given range.
roll_elements( data, cols = NULL, n = NULL, n_fn = NULL, n_col_name = ".n", overwrite = FALSE, ... ) roll_elements_vec(data, n = NULL, n_fn = NULL, ...)
roll_elements( data, cols = NULL, n = NULL, n_fn = NULL, n_col_name = ".n", overwrite = FALSE, ... ) roll_elements_vec(data, n = NULL, n_fn = NULL, ...)
data |
|
cols |
Names of columns in N.B. only used when |
n |
Number of positions to roll. A positive number rolls left/up. A negative number rolls right/down. |
n_fn |
Function to find Input: Each specified Output: It should return either a Can be created with |
n_col_name |
Name of new column with the applied |
overwrite |
Whether to allow overwriting of columns with the
same name as |
... |
Extra arguments for |
Rolled `data`
.
Ludvig Renbo Olsen, [email protected]
Other roll functions:
roll_values()
Other rearrange functions:
center_max()
,
center_min()
,
closest_to()
,
furthest_from()
,
pair_extremes()
,
position_max()
,
position_min()
,
rev_windows()
,
shuffle_hierarchy()
,
triplet_extremes()
# Attach packages library(rearrr) library(dplyr) # Roll vector left roll_elements(1:10, n = 2) # Roll vector right and return the vector roll_elements_vec(1:10, n = -2) # Roll vector left by median index (rounded to 6) roll_elements(3:12, n_fn = median_index) # Roll vector right by median value (rounded to 8) roll_elements(3:12, n_fn = create_n_fn(median, negate = TRUE)) # Pass extra arguments (here 'prob') to 'n_fn' via '...' roll_elements( 1:10, n_fn = quantile_index, prob = 0.2 ) # # Roll data.frame # # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = 1:20, "y" = runif(20) * 10, "g" = rep(1:4, each = 5) ) # Roll rows left/up roll_elements(df, n = 2) # Roll rows right/down roll_elements(df, n = -2) # Roll 'x' column right/down roll_elements(df, cols = "x", n = -2) # Roll rows right by median index in each group # Specify 'negate' for the 'median_index' function roll_elements( df %>% dplyr::group_by(g), n_fn = median_index, negate = TRUE )
# Attach packages library(rearrr) library(dplyr) # Roll vector left roll_elements(1:10, n = 2) # Roll vector right and return the vector roll_elements_vec(1:10, n = -2) # Roll vector left by median index (rounded to 6) roll_elements(3:12, n_fn = median_index) # Roll vector right by median value (rounded to 8) roll_elements(3:12, n_fn = create_n_fn(median, negate = TRUE)) # Pass extra arguments (here 'prob') to 'n_fn' via '...' roll_elements( 1:10, n_fn = quantile_index, prob = 0.2 ) # # Roll data.frame # # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = 1:20, "y" = runif(20) * 10, "g" = rep(1:4, each = 5) ) # Roll rows left/up roll_elements(df, n = 2) # Roll rows right/down roll_elements(df, n = -2) # Roll 'x' column right/down roll_elements(df, cols = "x", n = -2) # Roll rows right by median index in each group # Specify 'negate' for the 'median_index' function roll_elements( df %>% dplyr::group_by(g), n_fn = median_index, negate = TRUE )
Adds a specified value to each element in the vector and wraps the values around the min-max range with:
%
Useful when adding to the degrees of a circle, where the values should remain in the
0-360
range. A value larger than 360
will start over from 0
, e.g. ,
while a value smaller than
0
would become e.g. .
Here,
0
and 360
are considered the same angle.
If we were instead adding days to the weekdays 1-7
,
where 1
and 7
are separate days,
we can set `between = 1`
to have one day in-between them.
wrap_to_range()
is a wrapper with `add = 0`
.
The *_vec()
versions take and return a vector.
Should not be confused with roll_elements()
,
which changes the positions of the elements.
roll_values( data, cols = NULL, add = 0, .min = NULL, .max = NULL, between = 0, na.rm = FALSE, suffix = "_rolled", keep_original = TRUE, range_col_name = ".range", overwrite = FALSE ) wrap_to_range( data, cols = NULL, .min = NULL, .max = NULL, between = 0, na.rm = FALSE, suffix = "_wrapped", keep_original = TRUE, range_col_name = ".range", overwrite = FALSE ) roll_values_vec( data, add = 0, .min = NULL, .max = NULL, between = 0, na.rm = FALSE ) wrap_to_range_vec(data, .min = NULL, .max = NULL, between = 0, na.rm = FALSE)
roll_values( data, cols = NULL, add = 0, .min = NULL, .max = NULL, between = 0, na.rm = FALSE, suffix = "_rolled", keep_original = TRUE, range_col_name = ".range", overwrite = FALSE ) wrap_to_range( data, cols = NULL, .min = NULL, .max = NULL, between = 0, na.rm = FALSE, suffix = "_wrapped", keep_original = TRUE, range_col_name = ".range", overwrite = FALSE ) roll_values_vec( data, add = 0, .min = NULL, .max = NULL, between = 0, na.rm = FALSE ) wrap_to_range_vec(data, .min = NULL, .max = NULL, between = 0, na.rm = FALSE)
data |
|
cols |
Names of columns to roll/wrap in |
add |
Amount to add to each element. ( When |
.min |
Minimum value allowed. If |
.max |
Maximum value allowed. If |
between |
The wrapping distance between When When |
na.rm |
Whether to remove missing values ( |
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
range_col_name |
Name of new column with the min-max range. If N.B. Ignored when |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
`data`
with new columns with values in the specified min-max range(s)
and columns with the applied ranges.
The *_vec()
versions return a vector
.
Ludvig Renbo Olsen, [email protected]
Other roll functions:
roll_elements()
Other mutate functions:
apply_transformation_matrix()
,
cluster_groups()
,
dim_values()
,
expand_distances()
,
expand_distances_each()
,
flip_values()
,
rotate_2d()
,
rotate_3d()
,
shear_2d()
,
shear_3d()
,
swirl_2d()
,
swirl_3d()
# Attach packages library(rearrr) # Add 90 to all degrees # Note that 0 and 360 is considered the same angle # why there is no distance between the two roll_values(c(0:360), add = 90) # Get as vector roll_values_vec(c(0:360), add = 90) # Change limits to 0-180 # so e.g. 270 becomes 90 roll_values(c(0:360), .min = 0, .max = 180) # Change values first, then wrap to range x <- c(1:7) x <- x^2 wrap_to_range(x, .min = 1, .max = 7) # With 1 in-between .min and .max wrap_to_range(x, .min = 1, .max = 7, between = 1) # Get as vector wrap_to_range_vec(x, .min = 1, .max = 7, between = 1) # # Roll data.frame # # Set seed set.seed(1) # Create a data frame df <- data.frame( "w" = 1:7, "d" = c(0, 45, 90, 135, 180, 270, 360) ) # Roll weekdays by 1 day roll_values( df, cols = "w", add = 1, .min = 1, .max = 7, between = 1 ) # Roll degrees by -90 degrees roll_values( df, cols = "d", add = -90, .min = 0, .max = 360, between = 0 ) # Roll both weekdays and degrees by 1 # We don't specify .min and .max, so they # are based on the values in the columns # Note: This is not that meaningful but shows # the option roll_values( df, cols = c("w", "d"), add = 1 ) # Wrap weekdays to 2-5 range wrap_to_range( df, cols = "w", .min = 2, .max = 5, between = 1 )
# Attach packages library(rearrr) # Add 90 to all degrees # Note that 0 and 360 is considered the same angle # why there is no distance between the two roll_values(c(0:360), add = 90) # Get as vector roll_values_vec(c(0:360), add = 90) # Change limits to 0-180 # so e.g. 270 becomes 90 roll_values(c(0:360), .min = 0, .max = 180) # Change values first, then wrap to range x <- c(1:7) x <- x^2 wrap_to_range(x, .min = 1, .max = 7) # With 1 in-between .min and .max wrap_to_range(x, .min = 1, .max = 7, between = 1) # Get as vector wrap_to_range_vec(x, .min = 1, .max = 7, between = 1) # # Roll data.frame # # Set seed set.seed(1) # Create a data frame df <- data.frame( "w" = 1:7, "d" = c(0, 45, 90, 135, 180, 270, 360) ) # Roll weekdays by 1 day roll_values( df, cols = "w", add = 1, .min = 1, .max = 7, between = 1 ) # Roll degrees by -90 degrees roll_values( df, cols = "d", add = -90, .min = 0, .max = 360, between = 0 ) # Roll both weekdays and degrees by 1 # We don't specify .min and .max, so they # are based on the values in the columns # Note: This is not that meaningful but shows # the option roll_values( df, cols = c("w", "d"), add = 1 ) # Wrap weekdays to 2-5 range wrap_to_range( df, cols = "w", .min = 2, .max = 5, between = 1 )
The values are rotated counterclockwise around a specified origin.
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and rotating around e.g. the centroid
of each group.
rotate_2d( data, degrees, x_col = NULL, y_col = NULL, suffix = "_rotated", origin = NULL, origin_fn = NULL, keep_original = TRUE, degrees_col_name = ".degrees", origin_col_name = ".origin", overwrite = FALSE )
rotate_2d( data, degrees, x_col = NULL, y_col = NULL, suffix = "_rotated", origin = NULL, origin_fn = NULL, keep_original = TRUE, degrees_col_name = ".degrees", origin_col_name = ".origin", overwrite = FALSE )
data |
|
degrees |
Degrees to rotate values counterclockwise. In |
x_col |
Name of x column in |
y_col |
Name of y column in |
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
origin |
Coordinates of the origin to rotate around.
A |
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
degrees_col_name |
Name of new column with the degrees. If |
origin_col_name |
Name of new column with the origin coordinates. If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
Applies the following rotation matrix:
[ |
, |
] |
[ |
, |
] |
That is:
Where is the angle in radians.
As specified at Wikipedia/Rotation_matrix.
data.frame
(tibble
) with seven new columns containing
the rotated x-,y- and z-values and the degrees, radiuses and origin coordinates.
Ludvig Renbo Olsen, [email protected]
Other mutate functions:
apply_transformation_matrix()
,
cluster_groups()
,
dim_values()
,
expand_distances()
,
expand_distances_each()
,
flip_values()
,
roll_values()
,
rotate_3d()
,
shear_2d()
,
shear_3d()
,
swirl_2d()
,
swirl_3d()
Other rotation functions:
rotate_3d()
,
swirl_2d()
,
swirl_3d()
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "Index" = 1:12, "A" = c(1:4, 9:12, 15:18), "G" = rep(1:3, each = 4) ) # Rotate values around (0, 0) rotate_2d(df, degrees = 45, x_col = "Index", y_col = "A", origin = c(0, 0)) # Rotate A around the centroid df_rotated <- df %>% rotate_2d( x_col = "Index", y_col = "A", degrees = c(0, 120, 240), origin_fn = centroid ) df_rotated # Plot A and A rotated around overall centroid if (has_ggplot){ ggplot(df_rotated, aes(x = Index_rotated, y = A_rotated, color = factor(.degrees))) + geom_hline(yintercept = mean(df$A), size = 0.2, alpha = .4, linetype = "dashed") + geom_vline(xintercept = mean(df$Index), size = 0.2, alpha = .4, linetype = "dashed") + geom_line(alpha = .4) + geom_point() + theme_minimal() + labs(x = "Index", y = "Value", color = "Degrees") } # Rotate around group centroids df_grouped <- df %>% dplyr::group_by(G) %>% rotate_2d( x_col = "Index", y_col = "A", degrees = c(0, 120, 240), origin_fn = centroid ) df_grouped # Plot A and A rotated around group centroids if (has_ggplot){ ggplot(df_grouped, aes(x = Index_rotated, y = A_rotated, color = factor(.degrees))) + geom_point() + theme_minimal() + labs(x = "Index", y = "Value", color = "Degrees") }
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "Index" = 1:12, "A" = c(1:4, 9:12, 15:18), "G" = rep(1:3, each = 4) ) # Rotate values around (0, 0) rotate_2d(df, degrees = 45, x_col = "Index", y_col = "A", origin = c(0, 0)) # Rotate A around the centroid df_rotated <- df %>% rotate_2d( x_col = "Index", y_col = "A", degrees = c(0, 120, 240), origin_fn = centroid ) df_rotated # Plot A and A rotated around overall centroid if (has_ggplot){ ggplot(df_rotated, aes(x = Index_rotated, y = A_rotated, color = factor(.degrees))) + geom_hline(yintercept = mean(df$A), size = 0.2, alpha = .4, linetype = "dashed") + geom_vline(xintercept = mean(df$Index), size = 0.2, alpha = .4, linetype = "dashed") + geom_line(alpha = .4) + geom_point() + theme_minimal() + labs(x = "Index", y = "Value", color = "Degrees") } # Rotate around group centroids df_grouped <- df %>% dplyr::group_by(G) %>% rotate_2d( x_col = "Index", y_col = "A", degrees = c(0, 120, 240), origin_fn = centroid ) df_grouped # Plot A and A rotated around group centroids if (has_ggplot){ ggplot(df_grouped, aes(x = Index_rotated, y = A_rotated, color = factor(.degrees))) + geom_point() + theme_minimal() + labs(x = "Index", y = "Value", color = "Degrees") }
The values are rotated counterclockwise around a specified origin.
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and rotating around e.g. the centroid
of each group.
rotate_3d( data, x_col, y_col, z_col, x_deg = 0, y_deg = 0, z_deg = 0, suffix = "_rotated", origin = NULL, origin_fn = NULL, keep_original = TRUE, degrees_col_name = ".degrees", origin_col_name = ".origin", overwrite = FALSE )
rotate_3d( data, x_col, y_col, z_col, x_deg = 0, y_deg = 0, z_deg = 0, suffix = "_rotated", origin = NULL, origin_fn = NULL, keep_original = TRUE, degrees_col_name = ".degrees", origin_col_name = ".origin", overwrite = FALSE )
data |
|
x_col , y_col , z_col
|
Name of x/y/z column in |
x_deg , y_deg , z_deg
|
Degrees to rotate values around the x/y/z-axis counterclockwise. In
|
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
origin |
Coordinates of the origin to rotate around.
|
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
degrees_col_name |
Name of new column with the degrees. If Also adds a string version with the same name + |
origin_col_name |
Name of new column with the origin coordinates. If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
Applies the following rotation matrix:
[ |
, |
, |
] |
[ |
, |
, |
] |
[ |
, |
, |
] |
Where
`z_deg`
in radians,
`y_deg`
in radians,
`x_deg`
in radians.
As specified at Wikipedia/Rotation_matrix.
data.frame
(tibble
) with six new columns containing
the rotated x-,y- and z-values and the degrees and origin coordinates.
Ludvig Renbo Olsen, [email protected]
Other mutate functions:
apply_transformation_matrix()
,
cluster_groups()
,
dim_values()
,
expand_distances()
,
expand_distances_each()
,
flip_values()
,
roll_values()
,
rotate_2d()
,
shear_2d()
,
shear_3d()
,
swirl_2d()
,
swirl_3d()
Other rotation functions:
rotate_2d()
,
swirl_2d()
,
swirl_3d()
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(3) # Create a data frame df <- data.frame( "x" = 1:12, "y" = c(1:4, 9:12, 15:18), "z" = runif(12), "g" = rep(1:3, each = 4) ) # Rotate values 45 degrees around x-axis at (0, 0, 0) rotate_3d(df, x_col = "x", y_col = "y", z_col = "z", x_deg = 45, origin = c(0, 0, 0)) # Rotate all axes around the centroid df_rotated <- df %>% rotate_3d( x_col = "x", y_col = "y", z_col = "z", x_deg = c(0, 72, 144, 216, 288), y_deg = c(0, 72, 144, 216, 288), z_deg = c(0, 72, 144, 216, 288), origin_fn = centroid ) df_rotated # Plot rotations if (has_ggplot){ ggplot(df_rotated, aes(x = x_rotated, y = y_rotated, color = .degrees_str, alpha = z_rotated)) + geom_vline(xintercept = mean(df$x), size = 0.2, alpha = .4, linetype = "dashed") + geom_hline(yintercept = mean(df$y), size = 0.2, alpha = .4, linetype = "dashed") + geom_line(alpha = .4) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "degrees", alpha = "z (opacity)") } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_rotated$x_rotated, y = df_rotated$y_rotated, z = df_rotated$z_rotated, type = "scatter3d", mode = "markers", color = df_rotated$.degrees_str ) ## End(Not run) # Rotate randomly around all axes df_rotated <- df %>% rotate_3d( x_col = "x", y_col = "y", z_col = "z", x_deg = round(runif(10, min = 0, max = 360)), y_deg = round(runif(10, min = 0, max = 360)), z_deg = round(runif(10, min = 0, max = 360)), origin_fn = centroid ) df_rotated # Plot rotations if (has_ggplot){ ggplot(df_rotated, aes(x = x_rotated, y = y_rotated, color = .degrees_str, alpha = z_rotated)) + geom_vline(xintercept = mean(df$x), size = 0.2, alpha = .4, linetype = "dashed") + geom_hline(yintercept = mean(df$y), size = 0.2, alpha = .4, linetype = "dashed") + geom_line(alpha = .4) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "degrees", alpha = "z (opacity)") } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_rotated$x_rotated, y = df_rotated$y_rotated, z = df_rotated$z_rotated, type = "scatter3d", mode = "markers", color = df_rotated$.degrees_str ) ## End(Not run) # Rotate around group centroids df_grouped <- df %>% dplyr::group_by(g) %>% rotate_3d( x_col = "x", y_col = "y", z_col = "z", x_deg = c(0, 72, 144, 216, 288), y_deg = c(0, 72, 144, 216, 288), z_deg = c(0, 72, 144, 216, 288), origin_fn = centroid ) # Plot A and A rotated around group centroids if (has_ggplot){ ggplot(df_grouped, aes(x = x_rotated, y = y_rotated, color = .degrees_str, alpha = z_rotated)) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "degrees", alpha = "z (opacity)") } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_grouped$x_rotated, y = df_grouped$y_rotated, z = df_grouped$z_rotated, type = "scatter3d", mode = "markers", color = df_grouped$.degrees_str ) ## End(Not run)
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(3) # Create a data frame df <- data.frame( "x" = 1:12, "y" = c(1:4, 9:12, 15:18), "z" = runif(12), "g" = rep(1:3, each = 4) ) # Rotate values 45 degrees around x-axis at (0, 0, 0) rotate_3d(df, x_col = "x", y_col = "y", z_col = "z", x_deg = 45, origin = c(0, 0, 0)) # Rotate all axes around the centroid df_rotated <- df %>% rotate_3d( x_col = "x", y_col = "y", z_col = "z", x_deg = c(0, 72, 144, 216, 288), y_deg = c(0, 72, 144, 216, 288), z_deg = c(0, 72, 144, 216, 288), origin_fn = centroid ) df_rotated # Plot rotations if (has_ggplot){ ggplot(df_rotated, aes(x = x_rotated, y = y_rotated, color = .degrees_str, alpha = z_rotated)) + geom_vline(xintercept = mean(df$x), size = 0.2, alpha = .4, linetype = "dashed") + geom_hline(yintercept = mean(df$y), size = 0.2, alpha = .4, linetype = "dashed") + geom_line(alpha = .4) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "degrees", alpha = "z (opacity)") } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_rotated$x_rotated, y = df_rotated$y_rotated, z = df_rotated$z_rotated, type = "scatter3d", mode = "markers", color = df_rotated$.degrees_str ) ## End(Not run) # Rotate randomly around all axes df_rotated <- df %>% rotate_3d( x_col = "x", y_col = "y", z_col = "z", x_deg = round(runif(10, min = 0, max = 360)), y_deg = round(runif(10, min = 0, max = 360)), z_deg = round(runif(10, min = 0, max = 360)), origin_fn = centroid ) df_rotated # Plot rotations if (has_ggplot){ ggplot(df_rotated, aes(x = x_rotated, y = y_rotated, color = .degrees_str, alpha = z_rotated)) + geom_vline(xintercept = mean(df$x), size = 0.2, alpha = .4, linetype = "dashed") + geom_hline(yintercept = mean(df$y), size = 0.2, alpha = .4, linetype = "dashed") + geom_line(alpha = .4) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "degrees", alpha = "z (opacity)") } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_rotated$x_rotated, y = df_rotated$y_rotated, z = df_rotated$z_rotated, type = "scatter3d", mode = "markers", color = df_rotated$.degrees_str ) ## End(Not run) # Rotate around group centroids df_grouped <- df %>% dplyr::group_by(g) %>% rotate_3d( x_col = "x", y_col = "y", z_col = "z", x_deg = c(0, 72, 144, 216, 288), y_deg = c(0, 72, 144, 216, 288), z_deg = c(0, 72, 144, 216, 288), origin_fn = centroid ) # Plot A and A rotated around group centroids if (has_ggplot){ ggplot(df_grouped, aes(x = x_rotated, y = y_rotated, color = .degrees_str, alpha = z_rotated)) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "degrees", alpha = "z (opacity)") } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_grouped$x_rotated, y = df_grouped$y_rotated, z = df_grouped$z_rotated, type = "scatter3d", mode = "markers", color = df_grouped$.degrees_str ) ## End(Not run)
Shear a set of 2d points around an origin. The shearing formulas (excluding the origin movements) is:
The data points in `data`
are moved prior to the shearing, to bring
the origin to 0
in all dimensions. After the shearing, the
inverse move is applied to bring the origin back to its original position.
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and shearing around e.g. the centroid
of each group.
shear_2d( data, x_shear, y_shear = 0, x_col = NULL, y_col = NULL, suffix = "_sheared", origin = NULL, origin_fn = NULL, keep_original = TRUE, shear_col_name = ".shear", origin_col_name = ".origin", overwrite = FALSE )
shear_2d( data, x_shear, y_shear = 0, x_col = NULL, y_col = NULL, suffix = "_sheared", origin = NULL, origin_fn = NULL, keep_original = TRUE, shear_col_name = ".shear", origin_col_name = ".origin", overwrite = FALSE )
data |
|
x_shear |
Shear factor for the x dimension ( |
y_shear |
Shear factor for the y dimension ( |
x_col |
Name of x column in |
y_col |
Name of y column in |
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
origin |
Coordinates of the origin to shear around.
|
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
shear_col_name |
Name of new column with the shearing factors.
If Also adds a string version with the same name + |
origin_col_name |
Name of new column with the origin coordinates.
If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
data.frame
(tibble
) with sheared columns,
the shearing factors and the origin coordinates.
Ludvig Renbo Olsen, [email protected]
Other mutate functions:
apply_transformation_matrix()
,
cluster_groups()
,
dim_values()
,
expand_distances()
,
expand_distances_each()
,
flip_values()
,
roll_values()
,
rotate_2d()
,
rotate_3d()
,
shear_3d()
,
swirl_2d()
,
swirl_3d()
Other shearing functions:
shear_3d()
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Create a data frame df <- data.frame( "x" = rep(1:6, each = 2), "y" = rep(c(1, 4), 6), "g" = rep(1:2, each = 6) ) # Shear the x variable with regards to y # around the centroid df_sheared <- shear_2d( data = df, x_shear = 2.5, x_col = "x", y_col = "y", origin_fn = centroid ) # Plot sheared data # Black: original points # Red: sheared points if (has_ggplot){ df_sheared %>% ggplot(aes(x = x, y = y)) + geom_point() + geom_point(aes(x = x_sheared, y = y_sheared, color = "red")) + theme_minimal() } # Shear in both dimensions df_sheared <- shear_2d( data = df, x_shear = 2.5, y_shear = 2.5, x_col = "x", y_col = "y", origin_fn = centroid ) # Plot sheared data # Black: original points # Red: sheared points if (has_ggplot){ df_sheared %>% ggplot(aes(x = x, y = y)) + geom_point() + geom_point(aes(x = x_sheared,y = y_sheared, color = "red")) + theme_minimal() } # Shear grouped data frame # Affects the calculated origin df_sheared <- shear_2d( data = dplyr::group_by(df, g), x_shear = 2.5, x_col = "x", y_col = "y", origin_fn = centroid ) # Plot sheared data # Black: original points # Red: sheared points if (has_ggplot){ df_sheared %>% ggplot(aes(x = x, y = y)) + geom_point() + geom_point(aes(x = x_sheared, y = y_sheared, color = "red")) + theme_minimal() } # Shear a vector with multiple shearing factors shear_2d( data = c(1:10), x_shear = c(1, 1.5, 2, 2.5), origin = c(0, 0) )
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Create a data frame df <- data.frame( "x" = rep(1:6, each = 2), "y" = rep(c(1, 4), 6), "g" = rep(1:2, each = 6) ) # Shear the x variable with regards to y # around the centroid df_sheared <- shear_2d( data = df, x_shear = 2.5, x_col = "x", y_col = "y", origin_fn = centroid ) # Plot sheared data # Black: original points # Red: sheared points if (has_ggplot){ df_sheared %>% ggplot(aes(x = x, y = y)) + geom_point() + geom_point(aes(x = x_sheared, y = y_sheared, color = "red")) + theme_minimal() } # Shear in both dimensions df_sheared <- shear_2d( data = df, x_shear = 2.5, y_shear = 2.5, x_col = "x", y_col = "y", origin_fn = centroid ) # Plot sheared data # Black: original points # Red: sheared points if (has_ggplot){ df_sheared %>% ggplot(aes(x = x, y = y)) + geom_point() + geom_point(aes(x = x_sheared,y = y_sheared, color = "red")) + theme_minimal() } # Shear grouped data frame # Affects the calculated origin df_sheared <- shear_2d( data = dplyr::group_by(df, g), x_shear = 2.5, x_col = "x", y_col = "y", origin_fn = centroid ) # Plot sheared data # Black: original points # Red: sheared points if (has_ggplot){ df_sheared %>% ggplot(aes(x = x, y = y)) + geom_point() + geom_point(aes(x = x_sheared, y = y_sheared, color = "red")) + theme_minimal() } # Shear a vector with multiple shearing factors shear_2d( data = c(1:10), x_shear = c(1, 1.5, 2, 2.5), origin = c(0, 0) )
Shears points around an origin in 3-dimensional space. See applied shearing matrices under Details.
The data points in `data`
are moved prior to the shearing, to bring
the origin to 0
in all dimensions. After the shearing, the
inverse move is applied to bring the origin back to its original position.
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and shearing around e.g. the centroid
of each group.
shear_3d( data, x_col, y_col, z_col, x_shear = NULL, y_shear = NULL, z_shear = NULL, suffix = "_sheared", origin = NULL, origin_fn = NULL, keep_original = TRUE, shear_col_name = ".shear", origin_col_name = ".origin", overwrite = FALSE )
shear_3d( data, x_col, y_col, z_col, x_shear = NULL, y_shear = NULL, z_shear = NULL, suffix = "_sheared", origin = NULL, origin_fn = NULL, keep_original = TRUE, shear_col_name = ".shear", origin_col_name = ".origin", overwrite = FALSE )
data |
|
x_col , y_col , z_col
|
Name of x/y/z column in |
x_shear , y_shear , z_shear
|
Shear factor for the x/y/z dimension ( N.B. Exactly 2 of the dimensions must have shear factors specified. |
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
origin |
Coordinates of the origin to shear around.
|
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
shear_col_name |
Name of new column with the shearing amounts. If Also adds a string version with the same name + |
origin_col_name |
Name of new column with the origin coordinates. If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
Applies one of the following transformation matrices, depending on which two shearing amounts are specified:
Given `x_shear`
and `y_shear`
:
[ |
, |
, x_shear |
] |
[ |
, |
, y_shear |
] |
[ |
, |
, |
] |
Given `x_shear`
and `z_shear`
:
[ |
, x_shear |
, |
] |
[ |
, |
, |
] |
[ |
, z_shear |
, |
] |
Given `y_shear`
and `z_shear`
:
[ |
, |
, |
] |
[ y_shear |
, |
, |
] |
[ z_shear |
, |
, |
] |
data.frame
(tibble
) with six new columns containing
the sheared x-, y- and z-values and the shearing amounts and origin coordinates.
Ludvig Renbo Olsen, [email protected]
Other mutate functions:
apply_transformation_matrix()
,
cluster_groups()
,
dim_values()
,
expand_distances()
,
expand_distances_each()
,
flip_values()
,
roll_values()
,
rotate_2d()
,
rotate_3d()
,
shear_2d()
,
swirl_2d()
,
swirl_3d()
Other shearing functions:
shear_2d()
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) df_square <- square(runif(100)) %>% rename(x = .square_x, y = Value) %>% mutate(z = 1) # Shear the x and z axes # around the centroid df_sheared <- shear_3d( data = df_square, x_col = "x", y_col = "y", z_col = "z", x_shear = 2, z_shear = 4, origin_fn = centroid ) # Plot sheared data # Black: original points # Red: sheared points if (has_ggplot){ df_sheared %>% ggplot(aes(x = x, y = y)) + geom_point() + geom_point(aes(x = x_sheared, y = y_sheared, color = "red")) + theme_minimal() } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_sheared$x_sheared, y = df_sheared$y_sheared, z = df_sheared$z_sheared, type = "scatter3d", mode = "markers", color = df_sheared$.shear_str ) ## End(Not run) # Shear the y and z axes # around the centroid df_sheared <- shear_3d( data = df_square, x_col = "x", y_col = "y", z_col = "z", y_shear = 2, z_shear = 4, origin_fn = centroid ) # Plot sheared data # Black: original points # Red: sheared points if (has_ggplot){ df_sheared %>% ggplot(aes(x = x, y = y)) + geom_point() + geom_point(aes(x = x_sheared, y = y_sheared, color = "red")) + theme_minimal() } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_sheared$x_sheared, y = df_sheared$y_sheared, z = df_sheared$z_sheared, type = "scatter3d", mode = "markers", color = df_sheared$.shear_str ) ## End(Not run) # Shear the y and z axes with multiple amounts at once # around the centroid df_sheared <- shear_3d( data = df_square, x_col = "x", y_col = "y", z_col = "z", y_shear = c(0, 2, 4), z_shear = c(0, 4, 6), origin_fn = centroid ) # Plot sheared data if (has_ggplot){ df_sheared %>% ggplot(aes(x = x_sheared, y = y_sheared, color = .shear_str)) + geom_point() + theme_minimal() } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_sheared$x_sheared, y = df_sheared$y_sheared, z = df_sheared$z_sheared, type = "scatter3d", mode = "markers", color = df_sheared$.shear_str ) ## End(Not run)
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) df_square <- square(runif(100)) %>% rename(x = .square_x, y = Value) %>% mutate(z = 1) # Shear the x and z axes # around the centroid df_sheared <- shear_3d( data = df_square, x_col = "x", y_col = "y", z_col = "z", x_shear = 2, z_shear = 4, origin_fn = centroid ) # Plot sheared data # Black: original points # Red: sheared points if (has_ggplot){ df_sheared %>% ggplot(aes(x = x, y = y)) + geom_point() + geom_point(aes(x = x_sheared, y = y_sheared, color = "red")) + theme_minimal() } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_sheared$x_sheared, y = df_sheared$y_sheared, z = df_sheared$z_sheared, type = "scatter3d", mode = "markers", color = df_sheared$.shear_str ) ## End(Not run) # Shear the y and z axes # around the centroid df_sheared <- shear_3d( data = df_square, x_col = "x", y_col = "y", z_col = "z", y_shear = 2, z_shear = 4, origin_fn = centroid ) # Plot sheared data # Black: original points # Red: sheared points if (has_ggplot){ df_sheared %>% ggplot(aes(x = x, y = y)) + geom_point() + geom_point(aes(x = x_sheared, y = y_sheared, color = "red")) + theme_minimal() } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_sheared$x_sheared, y = df_sheared$y_sheared, z = df_sheared$z_sheared, type = "scatter3d", mode = "markers", color = df_sheared$.shear_str ) ## End(Not run) # Shear the y and z axes with multiple amounts at once # around the centroid df_sheared <- shear_3d( data = df_square, x_col = "x", y_col = "y", z_col = "z", y_shear = c(0, 2, 4), z_shear = c(0, 4, 6), origin_fn = centroid ) # Plot sheared data if (has_ggplot){ df_sheared %>% ggplot(aes(x = x_sheared, y = y_sheared, color = .shear_str)) + geom_point() + theme_minimal() } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_sheared$x_sheared, y = df_sheared$y_sheared, z = df_sheared$z_sheared, type = "scatter3d", mode = "markers", color = df_sheared$.shear_str ) ## End(Not run)
Shuffles a tree/hierarchy of groups, one column at a time. The levels in the last ("leaf") column are shuffled first, then the second-last column, and so on. Elements of the same group are ordered sequentially.
shuffle_hierarchy( data, group_cols, cols_to_shuffle = group_cols, leaf_has_groups = TRUE )
shuffle_hierarchy( data, group_cols, cols_to_shuffle = group_cols, leaf_has_groups = TRUE )
data |
|
group_cols |
Names of columns making up the group hierarchy.
The last column is the leaf and is shuffled first (if also in |
cols_to_shuffle |
Names of columns to shuffle hierarchically.
By default, all the |
leaf_has_groups |
Whether the leaf column contains groups or values. (Logical) When the elements are group identifiers, they are ordered sequentially and shuffled together. When the elements are values, they are simply shuffled. |
The shuffled data.frame
(tibble
).
Ludvig Renbo Olsen, [email protected]
Other rearrange functions:
center_max()
,
center_min()
,
closest_to()
,
furthest_from()
,
pair_extremes()
,
position_max()
,
position_min()
,
rev_windows()
,
roll_elements()
,
triplet_extremes()
# Attach packages library(rearrr) library(dplyr) df <- data.frame( 'a' = rep(1:4, each = 4), 'b' = rep(1:8, each = 2), 'c' = 1:16 ) # Set seed for reproducibility set.seed(2) # Shuffle all columns shuffle_hierarchy(df, group_cols = c('a', 'b', 'c')) # Don't shuffle 'b' but keep grouping by it # So 'c' will be shuffled within each group in 'b' shuffle_hierarchy( data = df, group_cols = c('a', 'b', 'c'), cols_to_shuffle = c('a', 'c') ) # Shuffle 'b' as if it's not a group column # so elements are independent within their group # (i.e. same-valued elements are not necessarily ordered sequentially) shuffle_hierarchy(df, group_cols = c('a', 'b'), leaf_has_groups = FALSE)
# Attach packages library(rearrr) library(dplyr) df <- data.frame( 'a' = rep(1:4, each = 4), 'b' = rep(1:8, each = 2), 'c' = 1:16 ) # Set seed for reproducibility set.seed(2) # Shuffle all columns shuffle_hierarchy(df, group_cols = c('a', 'b', 'c')) # Don't shuffle 'b' but keep grouping by it # So 'c' will be shuffled within each group in 'b' shuffle_hierarchy( data = df, group_cols = c('a', 'b', 'c'), cols_to_shuffle = c('a', 'c') ) # Shuffle 'b' as if it's not a group column # so elements are independent within their group # (i.e. same-valued elements are not necessarily ordered sequentially) shuffle_hierarchy(df, group_cols = c('a', 'b'), leaf_has_groups = FALSE)
Create the x-coordinates for a vector
of y-coordinates such that
they form a rotated square.
This will likely look most like a square when the y-coordinates are somewhat equally distributed, e.g. a uniform distribution.
square( data, y_col = NULL, .min = NULL, .max = NULL, offset_x = 0, keep_original = TRUE, x_col_name = ".square_x", edge_col_name = ".edge", overwrite = FALSE )
square( data, y_col = NULL, .min = NULL, .max = NULL, offset_x = 0, keep_original = TRUE, x_col_name = ".square_x", edge_col_name = ".edge", overwrite = FALSE )
data |
|
y_col |
Name of column in |
.min |
Minimum y-coordinate. If |
.max |
Maximum y-coordinate. If |
offset_x |
Value to offset the x-coordinates by. |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
x_col_name |
Name of new column with the x-coordinates. |
edge_col_name |
Name of new column with the edge identifiers. If Numbering is clockwise and starts at the upper-right edge. |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
data.frame
(tibble
) with the added x-coordinates and an identifier
for the edge the data point is a part of.
Ludvig Renbo Olsen, [email protected]
Other forming functions:
circularize()
,
hexagonalize()
,
triangularize()
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "y" = runif(200), "g" = factor(rep(1:5, each = 40)) ) # Square 'y' df_sq <- square(df, y_col = "y") df_sq # Plot square if (has_ggplot){ df_sq %>% ggplot(aes(x = .square_x, y = y, color = .edge)) + geom_point() + theme_minimal() } # # Grouped squaring # # Square 'y' for each group # First cluster the groups a bit to move the # squares away from each other df_sq <- df %>% cluster_groups( cols = "y", group_cols = "g", suffix = "", overwrite = TRUE ) %>% dplyr::group_by(g) %>% square( y_col = "y", overwrite = TRUE ) # Plot squares if (has_ggplot){ df_sq %>% ggplot(aes(x = .square_x, y = y, color = g)) + geom_point() + theme_minimal() } # # Specifying minimum value # # Specify minimum value manually df_sq <- square(df, y_col = "y", .min = -2) df_sq # Plot square if (has_ggplot){ df_sq %>% ggplot(aes(x = .square_x, y = y, color = .edge)) + geom_point() + theme_minimal() } # # Multiple squares by contraction # # Start by squaring 'y' df_sq <- square(df, y_col = "y") # Contract '.square_x' and 'y' towards the centroid # To contract with multiple multipliers at once, # we wrap the call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = c(1, 0.75, 0.5, 0.25, 0.125), .f = function(mult) { expand_distances( data = df_sq, cols = c(".square_x", "y"), multiplier = mult, origin_fn = centroid, overwrite = TRUE ) } ) df_expanded if (has_ggplot){ df_expanded %>% ggplot(aes( x = .square_x_expanded, y = y_expanded, color = .edge, alpha = .multiplier )) + geom_point() + theme_minimal() }
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "y" = runif(200), "g" = factor(rep(1:5, each = 40)) ) # Square 'y' df_sq <- square(df, y_col = "y") df_sq # Plot square if (has_ggplot){ df_sq %>% ggplot(aes(x = .square_x, y = y, color = .edge)) + geom_point() + theme_minimal() } # # Grouped squaring # # Square 'y' for each group # First cluster the groups a bit to move the # squares away from each other df_sq <- df %>% cluster_groups( cols = "y", group_cols = "g", suffix = "", overwrite = TRUE ) %>% dplyr::group_by(g) %>% square( y_col = "y", overwrite = TRUE ) # Plot squares if (has_ggplot){ df_sq %>% ggplot(aes(x = .square_x, y = y, color = g)) + geom_point() + theme_minimal() } # # Specifying minimum value # # Specify minimum value manually df_sq <- square(df, y_col = "y", .min = -2) df_sq # Plot square if (has_ggplot){ df_sq %>% ggplot(aes(x = .square_x, y = y, color = .edge)) + geom_point() + theme_minimal() } # # Multiple squares by contraction # # Start by squaring 'y' df_sq <- square(df, y_col = "y") # Contract '.square_x' and 'y' towards the centroid # To contract with multiple multipliers at once, # we wrap the call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = c(1, 0.75, 0.5, 0.25, 0.125), .f = function(mult) { expand_distances( data = df_sq, cols = c(".square_x", "y"), multiplier = mult, origin_fn = centroid, overwrite = TRUE ) } ) df_expanded if (has_ggplot){ df_expanded %>% ggplot(aes( x = .square_x_expanded, y = y_expanded, color = .edge, alpha = .multiplier )) + geom_point() + theme_minimal() }
The values are swirled counterclockwise around a specified origin. The swirling is done by rotating around the origin with the degrees based on the distances to the origin as so:
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and swirling around e.g. the centroid
of each group.
swirl_2d( data, radius, x_col = NULL, y_col = NULL, suffix = "_swirled", origin = NULL, origin_fn = NULL, scale_fn = identity, keep_original = TRUE, degrees_col_name = ".degrees", radius_col_name = ".radius", origin_col_name = ".origin", overwrite = FALSE )
swirl_2d( data, radius, x_col = NULL, y_col = NULL, suffix = "_swirled", origin = NULL, origin_fn = NULL, scale_fn = identity, keep_original = TRUE, degrees_col_name = ".degrees", radius_col_name = ".radius", origin_col_name = ".origin", overwrite = FALSE )
data |
|
radius |
Radius of the most-inner swirl on the x-axis in the simplest case.
A negative number changes the direction to clockwise rotation.
Can be a Note: With a custom |
x_col |
Name of x column in |
y_col |
Name of y column in |
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
origin |
Coordinates of the origin to swirl around.
|
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
scale_fn |
Function for scaling the distances before calculating the degrees. Input: A Output: A E.g.:
|
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
degrees_col_name |
Name of new column with the degrees. If |
radius_col_name |
Name of new column with the radius. If |
origin_col_name |
Name of new column with the origin coordinates. If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
data.frame
(tibble
) with three new columns containing
the swirled x- and y-values, the degrees, the radius, and the origin coordinates.
Ludvig Renbo Olsen, [email protected]
Other mutate functions:
apply_transformation_matrix()
,
cluster_groups()
,
dim_values()
,
expand_distances()
,
expand_distances_each()
,
flip_values()
,
roll_values()
,
rotate_2d()
,
rotate_3d()
,
shear_2d()
,
shear_3d()
,
swirl_3d()
Other rotation functions:
rotate_2d()
,
rotate_3d()
,
swirl_3d()
Other distance functions:
closest_to()
,
dim_values()
,
distance()
,
expand_distances()
,
expand_distances_each()
,
furthest_from()
,
swirl_3d()
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(4) # Create a data frame df <- data.frame( "x" = 1:50, "y" = 1, "r1" = runif(50), "r2" = runif(50) * 35, "g" = rep(1:5, each = 10) ) # Swirl values around (0, 0) swirl_2d( data = df, radius = 45, x_col = "x", y_col = "y", origin = c(0, 0) ) # Swirl around the centroid # with 6 different radius settings # Scale the distances with custom function df_swirled <- swirl_2d( data = df, radius = c(95, 96, 97, 98, 99, 100), x_col = "x", y_col = "y", origin_fn = centroid, scale_fn = function(d) { d^1.6 } ) df_swirled # Plot swirls if (has_ggplot){ df_swirled %>% ggplot(aes(x = x_swirled, y = y_swirled, color = factor(.radius))) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = ".radius") } # # Swirl random data # The trick lies in finding the right radius # # Swirl the random columns df_swirled <- swirl_2d( data = df, radius = 5, x_col = "r1", y_col = "r2", origin_fn = centroid ) # Plot swirls if (has_ggplot){ df_swirled %>% ggplot(aes(x = r1_swirled, y = r2_swirled)) + geom_point() + theme_minimal() + labs(x = "r1", y = "r2") }
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(4) # Create a data frame df <- data.frame( "x" = 1:50, "y" = 1, "r1" = runif(50), "r2" = runif(50) * 35, "g" = rep(1:5, each = 10) ) # Swirl values around (0, 0) swirl_2d( data = df, radius = 45, x_col = "x", y_col = "y", origin = c(0, 0) ) # Swirl around the centroid # with 6 different radius settings # Scale the distances with custom function df_swirled <- swirl_2d( data = df, radius = c(95, 96, 97, 98, 99, 100), x_col = "x", y_col = "y", origin_fn = centroid, scale_fn = function(d) { d^1.6 } ) df_swirled # Plot swirls if (has_ggplot){ df_swirled %>% ggplot(aes(x = x_swirled, y = y_swirled, color = factor(.radius))) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = ".radius") } # # Swirl random data # The trick lies in finding the right radius # # Swirl the random columns df_swirled <- swirl_2d( data = df, radius = 5, x_col = "r1", y_col = "r2", origin_fn = centroid ) # Plot swirls if (has_ggplot){ df_swirled %>% ggplot(aes(x = r1_swirled, y = r2_swirled)) + geom_point() + theme_minimal() + labs(x = "r1", y = "r2") }
The values are swirled counterclockwise around a specified origin. The swirling is done by rotating around the origin, basing the degrees for each rotation-axis on the distances to the origin as so:
The origin can be supplied as coordinates or as a function that returns coordinates. The
latter can be useful when supplying a grouped data.frame
and swirling around e.g. the centroid
of each group.
swirl_3d( data, x_col, y_col, z_col, x_radius = 0, y_radius = 0, z_radius = 0, suffix = "_swirled", origin = NULL, origin_fn = NULL, scale_fn = identity, keep_original = TRUE, degrees_col_name = ".degrees", radius_col_name = ".radius", origin_col_name = ".origin", overwrite = FALSE )
swirl_3d( data, x_col, y_col, z_col, x_radius = 0, y_radius = 0, z_radius = 0, suffix = "_swirled", origin = NULL, origin_fn = NULL, scale_fn = identity, keep_original = TRUE, degrees_col_name = ".degrees", radius_col_name = ".radius", origin_col_name = ".origin", overwrite = FALSE )
data |
|
x_col , y_col , z_col
|
Name of x/y/z column in |
x_radius , y_radius , z_radius
|
Radiuses of the
most-inner swirls around each axis (in the simplest case).
Can be E.g. the Note: With a custom |
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
origin |
Coordinates of the origin to swirl around.
|
origin_fn |
Function for finding the origin coordinates. Input: Each column will be passed as a Output: A Can be created with E.g. Built-in functions are |
scale_fn |
Function for scaling the distances before calculating the degrees. Input: A Output: A E.g.:
|
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
degrees_col_name |
Name of new column with the degrees. If Also adds a string version with the same name + |
radius_col_name |
Name of new column with the radiuses. If |
origin_col_name |
Name of new column with the origin coordinates. If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
data.frame
(tibble
) with new columns containing
the swirled x- and y-values, the degrees, the radiuses, and the origin coordinates.
Ludvig Renbo Olsen, [email protected]
Other mutate functions:
apply_transformation_matrix()
,
cluster_groups()
,
dim_values()
,
expand_distances()
,
expand_distances_each()
,
flip_values()
,
roll_values()
,
rotate_2d()
,
rotate_3d()
,
shear_2d()
,
shear_3d()
,
swirl_2d()
Other rotation functions:
rotate_2d()
,
rotate_3d()
,
swirl_2d()
Other distance functions:
closest_to()
,
dim_values()
,
distance()
,
expand_distances()
,
expand_distances_each()
,
furthest_from()
,
swirl_2d()
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(4) # Create a data frame df <- data.frame( "x" = 1:50, "y" = 1:50, "z" = 1:50, "r1" = runif(50), "r2" = runif(50) * 35, "o" = 1, "g" = rep(1:5, each = 10) ) # Swirl values around (0, 0, 0) swirl_3d( data = df, x_radius = 45, x_col = "x", y_col = "y", z_col = "z", origin = c(0, 0, 0) ) # Swirl around the centroid df_swirled <- swirl_3d( data = df, x_col = "x", y_col = "y", z_col = "z", x_radius = c(100, 0, 0), y_radius = c(0, 100, 0), z_radius = c(0, 0, 100), origin_fn = centroid ) df_swirled # Plot swirls if (has_ggplot){ ggplot(df_swirled, aes(x = x_swirled, y = y_swirled, color = .radius_str, alpha = z_swirled)) + geom_vline(xintercept = mean(df$x), size = 0.2, alpha = .4, linetype = "dashed") + geom_hline(yintercept = mean(df$y), size = 0.2, alpha = .4, linetype = "dashed") + geom_path(alpha = .4) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "radius", alpha = "z (opacity)") } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_swirled$x_swirled, y = df_swirled$y_swirled, z = df_swirled$z_swirled, type = "scatter3d", mode = "markers", color = df_swirled$.radius_str ) ## End(Not run) # Swirl around the centroid df_swirled <- swirl_3d( data = df, x_col = "x", y_col = "y", z_col = "z", x_radius = c(50, 0, 0), y_radius = c(0, 50, 0), z_radius = c(0, 0, 50), origin_fn = centroid ) df_swirled # Plot swirls if (has_ggplot){ ggplot(df_swirled, aes(x = x_swirled, y = y_swirled, color = .radius_str, alpha = z_swirled)) + geom_vline(xintercept = mean(df$x), size = 0.2, alpha = .4, linetype = "dashed") + geom_hline(yintercept = mean(df$y), size = 0.2, alpha = .4, linetype = "dashed") + geom_path(alpha = .4) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "radius", alpha = "z (opacity)") } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_swirled$x_swirled, y = df_swirled$y_swirled, z = df_swirled$z_swirled, type = "scatter3d", mode = "markers", color = df_swirled$.radius_str ) ## End(Not run) df_swirled <- swirl_3d( data = df, x_col = "x", y_col = "y", z_col = "z", x_radius = c(25, 50, 25, 25), y_radius = c(50, 75, 100, 25), z_radius = c(75, 25, 25, 25), origin_fn = centroid, scale_fn = function(x) { x^0.81 } ) # Plot swirls if (has_ggplot){ ggplot(df_swirled, aes(x = x_swirled, y = y_swirled, color = .radius_str, alpha = z_swirled)) + geom_vline(xintercept = mean(df$x), size = 0.2, alpha = .4, linetype = "dashed") + geom_hline(yintercept = mean(df$y), size = 0.2, alpha = .4, linetype = "dashed") + geom_path(alpha = .4) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "radius", alpha = "z (opacity)") } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_swirled$x_swirled, y = df_swirled$y_swirled, z = df_swirled$z_swirled, type = "scatter3d", mode = "markers", color = df_swirled$.radius_str ) ## End(Not run) # # Swirl random data # The trick lies in finding the right radiuses # # Swirl the random columns df_swirled <- swirl_3d( data = df, x_col = "r1", y_col = "r2", z_col = "o", x_radius = c(0, 0, 0, 0), y_radius = c(0, 30, 60, 90), z_radius = c(10, 10, 10, 10), origin_fn = centroid ) # Not let's rotate it every 10 degrees df_rotated <- df_swirled %>% rotate_3d( x_col = "r1_swirled", y_col = "r2_swirled", z_col = "o_swirled", x_deg = rep(0, 36), y_deg = rep(0, 36), z_deg = (1:36) * 10, suffix = "", origin = df_swirled$.origin[[1]], overwrite = TRUE ) # Plot rotated swirls if (has_ggplot){ ggplot( df_rotated, aes( x = r1_swirled, y = r2_swirled, color = .degrees_str, alpha = o_swirled ) ) + geom_vline(xintercept = mean(df$r1), size = 0.2, alpha = .4, linetype = "dashed") + geom_hline(yintercept = mean(df$r2), size = 0.2, alpha = .4, linetype = "dashed") + geom_point(show.legend = FALSE) + theme_minimal() + labs(x = "r1", y = "r2", color = "radius", alpha = "o (opacity)") }
# Attach packages library(rearrr) library(dplyr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(4) # Create a data frame df <- data.frame( "x" = 1:50, "y" = 1:50, "z" = 1:50, "r1" = runif(50), "r2" = runif(50) * 35, "o" = 1, "g" = rep(1:5, each = 10) ) # Swirl values around (0, 0, 0) swirl_3d( data = df, x_radius = 45, x_col = "x", y_col = "y", z_col = "z", origin = c(0, 0, 0) ) # Swirl around the centroid df_swirled <- swirl_3d( data = df, x_col = "x", y_col = "y", z_col = "z", x_radius = c(100, 0, 0), y_radius = c(0, 100, 0), z_radius = c(0, 0, 100), origin_fn = centroid ) df_swirled # Plot swirls if (has_ggplot){ ggplot(df_swirled, aes(x = x_swirled, y = y_swirled, color = .radius_str, alpha = z_swirled)) + geom_vline(xintercept = mean(df$x), size = 0.2, alpha = .4, linetype = "dashed") + geom_hline(yintercept = mean(df$y), size = 0.2, alpha = .4, linetype = "dashed") + geom_path(alpha = .4) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "radius", alpha = "z (opacity)") } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_swirled$x_swirled, y = df_swirled$y_swirled, z = df_swirled$z_swirled, type = "scatter3d", mode = "markers", color = df_swirled$.radius_str ) ## End(Not run) # Swirl around the centroid df_swirled <- swirl_3d( data = df, x_col = "x", y_col = "y", z_col = "z", x_radius = c(50, 0, 0), y_radius = c(0, 50, 0), z_radius = c(0, 0, 50), origin_fn = centroid ) df_swirled # Plot swirls if (has_ggplot){ ggplot(df_swirled, aes(x = x_swirled, y = y_swirled, color = .radius_str, alpha = z_swirled)) + geom_vline(xintercept = mean(df$x), size = 0.2, alpha = .4, linetype = "dashed") + geom_hline(yintercept = mean(df$y), size = 0.2, alpha = .4, linetype = "dashed") + geom_path(alpha = .4) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "radius", alpha = "z (opacity)") } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_swirled$x_swirled, y = df_swirled$y_swirled, z = df_swirled$z_swirled, type = "scatter3d", mode = "markers", color = df_swirled$.radius_str ) ## End(Not run) df_swirled <- swirl_3d( data = df, x_col = "x", y_col = "y", z_col = "z", x_radius = c(25, 50, 25, 25), y_radius = c(50, 75, 100, 25), z_radius = c(75, 25, 25, 25), origin_fn = centroid, scale_fn = function(x) { x^0.81 } ) # Plot swirls if (has_ggplot){ ggplot(df_swirled, aes(x = x_swirled, y = y_swirled, color = .radius_str, alpha = z_swirled)) + geom_vline(xintercept = mean(df$x), size = 0.2, alpha = .4, linetype = "dashed") + geom_hline(yintercept = mean(df$y), size = 0.2, alpha = .4, linetype = "dashed") + geom_path(alpha = .4) + geom_point() + theme_minimal() + labs(x = "x", y = "y", color = "radius", alpha = "z (opacity)") } ## Not run: # Plot 3d with plotly plotly::plot_ly( x = df_swirled$x_swirled, y = df_swirled$y_swirled, z = df_swirled$z_swirled, type = "scatter3d", mode = "markers", color = df_swirled$.radius_str ) ## End(Not run) # # Swirl random data # The trick lies in finding the right radiuses # # Swirl the random columns df_swirled <- swirl_3d( data = df, x_col = "r1", y_col = "r2", z_col = "o", x_radius = c(0, 0, 0, 0), y_radius = c(0, 30, 60, 90), z_radius = c(10, 10, 10, 10), origin_fn = centroid ) # Not let's rotate it every 10 degrees df_rotated <- df_swirled %>% rotate_3d( x_col = "r1_swirled", y_col = "r2_swirled", z_col = "o_swirled", x_deg = rep(0, 36), y_deg = rep(0, 36), z_deg = (1:36) * 10, suffix = "", origin = df_swirled$.origin[[1]], overwrite = TRUE ) # Plot rotated swirls if (has_ggplot){ ggplot( df_rotated, aes( x = r1_swirled, y = r2_swirled, color = .degrees_str, alpha = o_swirled ) ) + geom_vline(xintercept = mean(df$r1), size = 0.2, alpha = .4, linetype = "dashed") + geom_hline(yintercept = mean(df$r2), size = 0.2, alpha = .4, linetype = "dashed") + geom_point(show.legend = FALSE) + theme_minimal() + labs(x = "r1", y = "r2", color = "radius", alpha = "o (opacity)") }
Scales the vectors to unit length row-wise or column-wise.
The *_vec()
version take and return a vector
.
to_unit_length( data, cols = NULL, by_row = is.data.frame(data), suffix = ifelse(isTRUE(by_row), "_row_unit", "_col_unit"), overwrite = FALSE ) to_unit_length_vec(data)
to_unit_length( data, cols = NULL, by_row = is.data.frame(data), suffix = ifelse(isTRUE(by_row), "_row_unit", "_col_unit"), overwrite = FALSE ) to_unit_length_vec(data)
data |
|
cols |
Names of columns in |
by_row |
Whether to scale row Note: Disable when |
suffix |
Suffix to add to the names of the generated columns. Use an empty string (i.e. |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
Scaled vector
or data.frame
(tibble
) with the scaled columns.
Ludvig Renbo Olsen, [email protected]
Other scaling functions:
min_max_scale()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Scale row-wise to_unit_length(df, cols = c("x", "y"), by_row = TRUE) # Scale column-wise to_unit_length(df, cols = c("x", "y"), by_row = FALSE) # Overwrite columns to_unit_length(df, cols = c("x", "y"), suffix = "", overwrite = TRUE) # By groups in 'g' df %>% dplyr::group_by(g) %>% to_unit_length(cols = c("x", "y"), by_row = FALSE) # Scale a vector to_unit_length_vec(c(1:10)) to_unit_length(c(1:10), suffix = "", overwrite = TRUE) vector_length(to_unit_length_vec(c(1:10)))
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Scale row-wise to_unit_length(df, cols = c("x", "y"), by_row = TRUE) # Scale column-wise to_unit_length(df, cols = c("x", "y"), by_row = FALSE) # Overwrite columns to_unit_length(df, cols = c("x", "y"), suffix = "", overwrite = TRUE) # By groups in 'g' df %>% dplyr::group_by(g) %>% to_unit_length(cols = c("x", "y"), by_row = FALSE) # Scale a vector to_unit_length_vec(c(1:10)) to_unit_length(c(1:10), suffix = "", overwrite = TRUE) vector_length(to_unit_length_vec(c(1:10)))
Given two data.frames
with the same columns (and groupings),
transfer the centroids from one to the other.
This is commonly used to restore the centroids after transforming the columns.
transfer_centroids(to_data, from_data, cols, group_cols = NULL)
transfer_centroids(to_data, from_data, cols, group_cols = NULL)
to_data |
Existing |
from_data |
Existing |
cols |
Names of numeric columns to transfer centroids to.
Must exist in both |
group_cols |
Names of grouping columns. |
The `to_data`
data.frame
(tibble
) with the
centroids from the `from_data`
data.frame
.
Ludvig Renbo Olsen, [email protected]
Other clustering functions:
cluster_groups()
,
generate_clusters()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Create another data frame with different x and y values df2 <- df df2$x <- runif(20) df2$y <- runif(20) # Check centroids before transfer df %>% dplyr::group_by(g) %>% dplyr::summarize_all(mean) df2 %>% dplyr::group_by(g) %>% dplyr::summarize_all(mean) # Now let's transfer the centroids from df to df2 df3 <- transfer_centroids( to_data = df2, from_data = df, cols = c("x", "y"), group_cols = "g" ) # Check that the transfer gave us the same centroids as df df3 %>% dplyr::group_by(g) %>% dplyr::summarize_all(mean)
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Create another data frame with different x and y values df2 <- df df2$x <- runif(20) df2$y <- runif(20) # Check centroids before transfer df %>% dplyr::group_by(g) %>% dplyr::summarize_all(mean) df2 %>% dplyr::group_by(g) %>% dplyr::summarize_all(mean) # Now let's transfer the centroids from df to df2 df3 <- transfer_centroids( to_data = df2, from_data = df, cols = c("x", "y"), group_cols = "g" ) # Check that the transfer gave us the same centroids as df df3 %>% dplyr::group_by(g) %>% dplyr::summarize_all(mean)
Container for the type of transformation used in
Pipeline
.
Note: For internal use.
name
Name of transformation.
fn
Transformation function.
args
list
of arguments for `fn`
.
group_cols
Names of columns to group data.frame
by before applying `fn`
.
When `NULL`
, the data.frame
is not grouped.
new()
Initialize transformation.
Transformation$new(fn, args, name = NULL, group_cols = NULL)
fn
Transformation function.
args
list
of arguments for `fn`
.
name
Name of transformation.
group_cols
Names of columns to group data.frame
by before applying `fn`
.
When `NULL`
, the data.frame
is not grouped.
apply()
Apply the transformation to a data.frame
.
Transformation$apply(data)
data
data.frame
.
A grouped data.frame
will first be ungrouped. If `group_cols`
is specified,
it will then be grouped by those columns.
Transformed version of `data`
.
print()
Print an overview of the transformation.
Transformation$print(..., indent = 0, show_class = TRUE)
...
further arguments passed to or from other methods.
indent
How many spaces to indent when printing.
show_class
Whether to print the transformation class name.
The pipeline. To allow chaining of methods.
clone()
The objects of this class are cloneable with this method.
Transformation$clone(deep = FALSE)
deep
Whether to make a deep clone.
Ludvig Renbo Olsen, [email protected]
Other transformation classes:
FixedGroupsTransformation
,
GeneratedTransformation
Create the x-coordinates for a vector
of y-coordinates such that
they form a triangle.
The data points are stochastically distributed based on edge lengths, why it might be preferable to set a random seed.
This will likely look most like a triangle when the y-coordinates are somewhat equally distributed, e.g. a uniform distribution.
triangularize( data, y_col = NULL, .min = NULL, .max = NULL, offset_x = 0, keep_original = TRUE, x_col_name = ".triangle_x", edge_col_name = ".edge", overwrite = FALSE )
triangularize( data, y_col = NULL, .min = NULL, .max = NULL, offset_x = 0, keep_original = TRUE, x_col_name = ".triangle_x", edge_col_name = ".edge", overwrite = FALSE )
data |
|
y_col |
Name of column in |
.min |
Minimum y-coordinate. If |
.max |
Maximum y-coordinate. If |
offset_x |
Value to offset the x-coordinates by. |
keep_original |
Whether to keep the original columns. (Logical) Some columns may have been overwritten, in which case only the newest versions are returned. |
x_col_name |
Name of new column with the x-coordinates. |
edge_col_name |
Name of new column with the edge identifiers. If Numbering is clockwise and starts at the upper-right edge. |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
data.frame
(tibble
) with the added x-coordinates and an identifier
for the edge the data point is a part of.
Ludvig Renbo Olsen, [email protected]
Other forming functions:
circularize()
,
hexagonalize()
,
square()
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "y" = runif(200), "g" = factor(rep(1:5, each = 40)) ) # Triangularize 'y' df_tri <- triangularize(df, y_col = "y") df_tri # Plot triangle if (has_ggplot){ df_tri %>% ggplot(aes(x = .triangle_x, y = y, color = .edge)) + geom_point() + theme_minimal() } # # Grouped squaring # # Triangularize 'y' for each group # First cluster the groups a bit to move the # triangles away from each other df_tri <- df %>% cluster_groups( cols = "y", group_cols = "g", suffix = "", overwrite = TRUE ) %>% dplyr::group_by(g) %>% triangularize( y_col = "y", overwrite = TRUE ) # Plot triangles if (has_ggplot){ df_tri %>% ggplot(aes(x = .triangle_x, y = y, color = g)) + geom_point() + theme_minimal() } # # Specifying minimum value # # Specify minimum value manually df_tri <- triangularize(df, y_col = "y", .min = -2) df_tri # Plot triangle if (has_ggplot){ df_tri %>% ggplot(aes(x = .triangle_x, y = y, color = .edge)) + geom_point() + theme_minimal() } # # Multiple triangles by contraction # # Start by squaring 'y' df_tri <- triangularize(df, y_col = "y") # Contract '.triangle_x' and 'y' towards the centroid # To contract with multiple multipliers at once, # we wrap the call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = 1:10 / 10, .f = function(mult) { expand_distances( data = df_tri, cols = c(".triangle_x", "y"), multiplier = mult, origin_fn = centroid, overwrite = TRUE ) } ) df_expanded if (has_ggplot){ df_expanded %>% ggplot(aes( x = .triangle_x_expanded, y = y_expanded, color = .edge, alpha = .multiplier )) + geom_point() + theme_minimal() }
# Attach packages library(rearrr) library(dplyr) library(purrr) has_ggplot <- require(ggplot2) # Attach if installed # Set seed set.seed(1) # Create a data frame df <- data.frame( "y" = runif(200), "g" = factor(rep(1:5, each = 40)) ) # Triangularize 'y' df_tri <- triangularize(df, y_col = "y") df_tri # Plot triangle if (has_ggplot){ df_tri %>% ggplot(aes(x = .triangle_x, y = y, color = .edge)) + geom_point() + theme_minimal() } # # Grouped squaring # # Triangularize 'y' for each group # First cluster the groups a bit to move the # triangles away from each other df_tri <- df %>% cluster_groups( cols = "y", group_cols = "g", suffix = "", overwrite = TRUE ) %>% dplyr::group_by(g) %>% triangularize( y_col = "y", overwrite = TRUE ) # Plot triangles if (has_ggplot){ df_tri %>% ggplot(aes(x = .triangle_x, y = y, color = g)) + geom_point() + theme_minimal() } # # Specifying minimum value # # Specify minimum value manually df_tri <- triangularize(df, y_col = "y", .min = -2) df_tri # Plot triangle if (has_ggplot){ df_tri %>% ggplot(aes(x = .triangle_x, y = y, color = .edge)) + geom_point() + theme_minimal() } # # Multiple triangles by contraction # # Start by squaring 'y' df_tri <- triangularize(df, y_col = "y") # Contract '.triangle_x' and 'y' towards the centroid # To contract with multiple multipliers at once, # we wrap the call in purrr::map_dfr df_expanded <- purrr::map_dfr( .x = 1:10 / 10, .f = function(mult) { expand_distances( data = df_tri, cols = c(".triangle_x", "y"), multiplier = mult, origin_fn = centroid, overwrite = TRUE ) } ) df_expanded if (has_ggplot){ df_expanded %>% ggplot(aes( x = .triangle_x_expanded, y = y_expanded, color = .edge, alpha = .multiplier )) + geom_point() + theme_minimal() }
The values are grouped in three such that the first group is formed by the lowest and highest values and the value closest to the median, the second group is formed by the second lowest and second highest values and the value second closest to the median, and so on. The values are then sorted by these groups and their actual value.
When the number of rows/elements in `data`
is not evenly divisible by three,
the `unequal_method_1`
(single excessive element) and
`unequal_method_2`
(two excessive elements)
determines which element(s) should form a smaller group.
This group will be the first group in a given grouping (see `num_groupings`
)
with the identifier 1
.
The *_vec()
version takes and returns a vector
.
Example:
The column values:
c(1, 2, 3, 4, 5, 6)
Are sorted in triplets as:
c(1, 3, 6, 2, 4, 5)
triplet_extremes( data, col = NULL, middle_is = "middle", unequal_method_1 = "middle", unequal_method_2 = c("middle", "middle"), num_groupings = 1, balance = "mean", order_by_aggregates = FALSE, shuffle_members = FALSE, shuffle_triplets = FALSE, factor_name = ifelse(num_groupings == 1, ".triplet", ".tripleting"), overwrite = FALSE ) triplet_extremes_vec( data, middle_is = "middle", unequal_method_1 = "middle", unequal_method_2 = c("middle", "middle"), num_groupings = 1, balance = "mean", order_by_aggregates = FALSE, shuffle_members = FALSE, shuffle_triplets = FALSE )
triplet_extremes( data, col = NULL, middle_is = "middle", unequal_method_1 = "middle", unequal_method_2 = c("middle", "middle"), num_groupings = 1, balance = "mean", order_by_aggregates = FALSE, shuffle_members = FALSE, shuffle_triplets = FALSE, factor_name = ifelse(num_groupings == 1, ".triplet", ".tripleting"), overwrite = FALSE ) triplet_extremes_vec( data, middle_is = "middle", unequal_method_1 = "middle", unequal_method_2 = c("middle", "middle"), num_groupings = 1, balance = "mean", order_by_aggregates = FALSE, shuffle_members = FALSE, shuffle_triplets = FALSE )
data |
|
col |
Column to create sorting factor by.
When |
middle_is |
Whether the middle element in the triplet is the nth closest element to the median value or the nth+1 lowest/highest value. One of: Triplet grouping is performed greedily from the most extreme values to the least extreme
values. E.g. Examples: When
When
When
|
unequal_method_1 , unequal_method_2
|
Method for dealing with either
a single excessive element (
Note: The excessive element(s) are extracted before triplet grouping. These elements
are put in their own group and given group identifier E.g. When |
num_groupings |
Number of times to group into triplets (recursively). At least Based on |
balance |
What to balance triplets for in a given secondary triplet grouping.
Either The first triplet grouping always groups the actual element values. meanTriplets have similar means. The values in the triplets from the previous grouping
are aggregated with spreadTriplets have similar spread (e.g. standard deviations). The values in the triplets
from the previous triplet grouping are aggregated with min / maxTriplets have similar minimum / maximum values. The values in the triplets from the
previous triplet grouping are aggregated with |
order_by_aggregates |
Whether to order the groups from initial groupings (first N.B. Only used when |
shuffle_members |
Whether to shuffle the order of the group members within the groups. (Logical) |
shuffle_triplets |
Whether to shuffle the order of the triplets. Triplet members remain together. (Logical) |
factor_name |
Name of new column with the sorting factor.
If |
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
The sorted data.frame
(tibble
) / vector
.
Optionally with the sorting factor added.
When `data`
is a vector
and `keep_factors`
is `FALSE`
,
the output will be a vector
. Otherwise, a data.frame
.
Ludvig Renbo Olsen, [email protected]
Other rearrange functions:
center_max()
,
center_min()
,
closest_to()
,
furthest_from()
,
pair_extremes()
,
position_max()
,
position_min()
,
rev_windows()
,
roll_elements()
,
shuffle_hierarchy()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:12, "A" = sample(1:12), "B" = runif(12), "C" = LETTERS[1:12], "G" = c( 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Triplet group extreme indices (row numbers) triplet_extremes(df) # Triplet group extremes in each of the columns triplet_extremes(df, col = "A")$A triplet_extremes(df, col = "B")$B triplet_extremes(df, col = "C")$C # Shuffle the members triplet-wise # The triplets maintain their order # but the rows within each triplet are shuffled triplet_extremes(df, col = "A", shuffle_members = TRUE) # Shuffle the order of the triplets # The triplets are shuffled but # the rows within each triplet maintain their order triplet_extremes(df, col = "A", shuffle_triplets = TRUE) # Use recursive grouping # Mostly meaningful with much larger datasets # Order initial grouping by group identifiers triplet_extremes(df, col = "A", num_groupings = 2) # Order initial grouping by aggregate values triplet_extremes(df, col = "A", num_groupings = 2, order_by_aggregates = TRUE) # Grouped by G # Each G group only has 4 elements # so it only creates 1 triplet and a group # with the single excessive element # per G group df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% triplet_extremes(col = "A") # Plot the extreme triplets plot( x = 1:12, y = triplet_extremes(df, col = "A")$A, col = as.character(rep(1:4, each = 3)) ) # With shuffled triplet members (run a few times) plot( x = 1:12, y = triplet_extremes(df, col = "A", shuffle_members = TRUE)$A, col = as.character(rep(1:4, each = 3)) ) # With shuffled triplets (run a few times) plot( x = rep(1:6, each = 2), y = triplet_extremes(df, col = "A", shuffle_triplets = TRUE)$A, col = as.character(rep(1:4, each = 3)) )
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "index" = 1:12, "A" = sample(1:12), "B" = runif(12), "C" = LETTERS[1:12], "G" = c( 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3 ), stringsAsFactors = FALSE ) # Triplet group extreme indices (row numbers) triplet_extremes(df) # Triplet group extremes in each of the columns triplet_extremes(df, col = "A")$A triplet_extremes(df, col = "B")$B triplet_extremes(df, col = "C")$C # Shuffle the members triplet-wise # The triplets maintain their order # but the rows within each triplet are shuffled triplet_extremes(df, col = "A", shuffle_members = TRUE) # Shuffle the order of the triplets # The triplets are shuffled but # the rows within each triplet maintain their order triplet_extremes(df, col = "A", shuffle_triplets = TRUE) # Use recursive grouping # Mostly meaningful with much larger datasets # Order initial grouping by group identifiers triplet_extremes(df, col = "A", num_groupings = 2) # Order initial grouping by aggregate values triplet_extremes(df, col = "A", num_groupings = 2, order_by_aggregates = TRUE) # Grouped by G # Each G group only has 4 elements # so it only creates 1 triplet and a group # with the single excessive element # per G group df %>% dplyr::select(G, A) %>% # For clarity dplyr::group_by(G) %>% triplet_extremes(col = "A") # Plot the extreme triplets plot( x = 1:12, y = triplet_extremes(df, col = "A")$A, col = as.character(rep(1:4, each = 3)) ) # With shuffled triplet members (run a few times) plot( x = 1:12, y = triplet_extremes(df, col = "A", shuffle_members = TRUE)$A, col = as.character(rep(1:4, each = 3)) ) # With shuffled triplets (run a few times) plot( x = rep(1:6, each = 2), y = triplet_extremes(df, col = "A", shuffle_triplets = TRUE)$A, col = as.character(rep(1:4, each = 3)) )
Calculates vector lengths/magnitudes row- or column-wise with
Where is the
vector
to get the length/magnitude of.
Should not be confused with length()
, which counts the elements.
vector_length( data, cols = NULL, by_row = is.data.frame(data), len_col_name = ".vec_len", overwrite = FALSE )
vector_length( data, cols = NULL, by_row = is.data.frame(data), len_col_name = ".vec_len", overwrite = FALSE )
data |
|
cols |
Names of columns in |
by_row |
Whether to measure length of row vectors instead of column vectors. (Logical) Note: Disable when |
len_col_name |
Name of new column with the row vector lengths when
|
overwrite |
Whether to allow overwriting of existing columns. (Logical) |
Vector length(s).
When `data`
is a vector
: scalar
When `data`
is a data.frame
and `by_row`
is TRUE
:
`data`
with an extra column with row vector lengths.
When `data`
is a data.frame
and `by_row`
is FALSE
:
A data.frame
with the summarized column vector lengths.
Ludvig Renbo Olsen, [email protected]
Other measuring functions:
angle()
,
distance()
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Measure row-wise vector_length(df, cols = c("x", "y"), by_row = TRUE) # Measure column-wise vector_length(df, cols = c("x", "y"), by_row = FALSE) # By groups in 'g' df %>% dplyr::group_by(g) %>% vector_length(cols = c("x", "y"), by_row = FALSE) # Measure vector length of a vector vector_length(c(1:10))
# Attach packages library(rearrr) library(dplyr) # Set seed set.seed(1) # Create a data frame df <- data.frame( "x" = runif(20), "y" = runif(20), "g" = rep(1:4, each = 5) ) # Measure row-wise vector_length(df, cols = c("x", "y"), by_row = TRUE) # Measure column-wise vector_length(df, cols = c("x", "y"), by_row = FALSE) # By groups in 'g' df %>% dplyr::group_by(g) %>% vector_length(cols = c("x", "y"), by_row = FALSE) # Measure vector length of a vector vector_length(c(1:10))