Skip to contents

This function splits a dataframe into any number of dataframes such that they can be rejoined by using rbind()/dplyr::bind_rows() for unrbind() or cbind()/dplyr::bind_cols() for uncbind(). The user may find it appropriate to go on and apply messy() to each new dataframe independently to impede rejoining.

Usage

unrbind(data, sizes = NULL, probs = NULL, names = NULL, shuffle = TRUE)

uncbind(data, sizes = NULL, probs = NULL, names = NULL, shuffle = TRUE)

Arguments

data

input dataframe

sizes

A vector of numeric inputs summing to nrow(data) for unrbind() or ncol(data) for uncbind(); the number of rows of each resulting dataframe. See probs for an alternative approach. If neither are provided, the dataframe will be split roughly in half.

probs

A vector of numeric inputs summing to 1; the proportion of rows/columns in each resulting dataframe. An alternative to sizes.

names

The names of the output list. If NULL the list will be unnamed.

shuffle

Shuffle rows in unrbind() or columns in uncbind()? Defaults to TRUE.

Value

A list of dataframes

Details

Real data can often be found in disparate files. For example, data reports may come in monthly and require row-binding together to obtain a complete annual time series. Scientific results may arrive from different laboratories and require binding together for further analysis and comparisons. This function may simulate a single dataframe having come from different sources and requiring binding back together. Base R's split() offers an alternative to unrbind(), but requires a pre-existing factor column to split by and cannot as easily create random splits in the data.

See also

Other data deconstructors: unjoin()

Author

Jack Davison

Examples

unrbind(dplyr::tibble(mtcars), probs = c(0.5, 0.3, 0.2))
#> [[1]]
#> # A tibble: 16 × 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21.4     6 258     110  3.08  3.22  19.4     1     0     3     1
#>  2  17.8     6 168.    123  3.92  3.44  18.9     1     0     4     4
#>  3  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
#>  4  26       4 120.     91  4.43  2.14  16.7     0     1     5     2
#>  5  21       6 160     110  3.9   2.62  16.5     0     1     4     4
#>  6  13.3     8 350     245  3.73  3.84  15.4     0     0     3     4
#>  7  19.7     6 145     175  3.62  2.77  15.5     0     1     5     6
#>  8  19.2     8 400     175  3.08  3.84  17.0     0     0     3     2
#>  9  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
#> 10  15.2     8 276.    180  3.07  3.78  18       0     0     3     3
#> 11  15.2     8 304     150  3.15  3.44  17.3     0     0     3     2
#> 12  21       6 160     110  3.9   2.88  17.0     0     1     4     4
#> 13  10.4     8 460     215  3     5.42  17.8     0     0     3     4
#> 14  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
#> 15  14.7     8 440     230  3.23  5.34  17.4     0     0     3     4
#> 16  18.1     6 225     105  2.76  3.46  20.2     1     0     3     1
#> 
#> [[2]]
#> # A tibble: 10 × 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
#>  2  16.4     8 276.    180  3.07  4.07  17.4     0     0     3     3
#>  3  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
#>  4  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
#>  5  19.2     6 168.    123  3.92  3.44  18.3     1     0     4     4
#>  6  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
#>  7  10.4     8 472     205  2.93  5.25  18.0     0     0     3     4
#>  8  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
#>  9  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
#> 10  21.4     4 121     109  4.11  2.78  18.6     1     1     4     2
#> 
#> [[3]]
#> # A tibble: 6 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  15.5     8  318    150  2.76  3.52  16.9     0     0     3     2
#> 2  15.8     8  351    264  4.22  3.17  14.5     0     1     5     4
#> 3  17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
#> 4  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#> 5  15       8  301    335  3.54  3.57  14.6     0     1     5     8
#> 6  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#> 

uncbind(dplyr::tibble(mtcars), probs = c(0.5, 0.3, 0.2))
#> [[1]]
#> # A tibble: 32 × 6
#>       hp    wt  gear  disp    am    vs
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1   110  2.62     4  160      1     0
#>  2   110  2.88     4  160      1     0
#>  3    93  2.32     4  108      1     1
#>  4   110  3.22     3  258      0     1
#>  5   175  3.44     3  360      0     0
#>  6   105  3.46     3  225      0     1
#>  7   245  3.57     3  360      0     0
#>  8    62  3.19     4  147.     0     1
#>  9    95  3.15     4  141.     0     1
#> 10   123  3.44     4  168.     0     1
#> # ℹ 22 more rows
#> 
#> [[2]]
#> # A tibble: 32 × 3
#>     qsec   cyl  carb
#>    <dbl> <dbl> <dbl>
#>  1  16.5     6     4
#>  2  17.0     6     4
#>  3  18.6     4     1
#>  4  19.4     6     1
#>  5  17.0     8     2
#>  6  20.2     6     1
#>  7  15.8     8     4
#>  8  20       4     2
#>  9  22.9     4     2
#> 10  18.3     6     4
#> # ℹ 22 more rows
#> 
#> [[3]]
#> # A tibble: 32 × 2
#>      mpg  drat
#>    <dbl> <dbl>
#>  1  21    3.9 
#>  2  21    3.9 
#>  3  22.8  3.85
#>  4  21.4  3.08
#>  5  18.7  3.15
#>  6  18.1  2.76
#>  7  14.3  3.21
#>  8  24.4  3.69
#>  9  22.8  3.92
#> 10  19.2  3.92
#> # ℹ 22 more rows
#>