Package 'desctable' reference manual

Title:	Produce Descriptive and Comparative Tables Easily
Description:	Easily create descriptive and comparative tables. It makes use and integrates directly with the tidyverse family of packages, and pipes. Tables are produced as (nested) dataframes for easy manipulation.
Authors:	Maxime Wack [aut, cre], Adrien Boukobza [aut], Yihui Xie [ctb]
Maintainer:	Maxime Wack <[email protected]>
License:	GPL-3
Version:	0.3.0
Built:	2025-03-10 03:49:03 UTC
Source:	https://github.com/desctable/desctable

Wrapper for oneway.test(var.equal = T)

Description

Wrapper for oneway.test(var.equal = T)

Usage

ANOVA(formula)
ANOVA(formula)

Arguments

formula

An anova formula (variable ~ grouping variable)

As.data.frame method for desctable

Description

As.data.frame method for desctable

Usage

## S3 method for class 'desctable'
as.data.frame(x, ...)
## S3 method for class 'desctable'
as.data.frame(x, ...)

Arguments

`x`	A desctable
`...`	Additional as.data.frame parameters

Value

A flat dataframe

Pearson's Chi-squared Test for Count Data

Description

chisq.test performs chi-squared contingency table tests and goodness-of-fit tests, with an added method for formulas.

Usage

chisq.test(x, y, correct, p, rescale.p, simulate.p.value, B)

## Default S3 method:
chisq.test(
  x,
  y = NULL,
  correct = TRUE,
  p = rep(1/length(x), length(x)),
  rescale.p = FALSE,
  simulate.p.value = FALSE,
  B = 2000
)

## S3 method for class 'formula'
chisq.test(
  x,
  y = NULL,
  correct = T,
  p = rep(1/length(x), length(x)),
  rescale.p = F,
  simulate.p.value = F,
  B = 2000
)
chisq.test(x, y, correct, p, rescale.p, simulate.p.value, B)

## Default S3 method:
chisq.test(
  x,
  y = NULL,
  correct = TRUE,
  p = rep(1/length(x), length(x)),
  rescale.p = FALSE,
  simulate.p.value = FALSE,
  B = 2000
)

## S3 method for class 'formula'
chisq.test(
  x,
  y = NULL,
  correct = T,
  p = rep(1/length(x), length(x)),
  rescale.p = F,
  simulate.p.value = F,
  B = 2000
)

Arguments

`x`	a numeric vector, or matrix, or formula of the form `lhs ~ rhs` where `lhs` and `rhs` are factors. `x` and `y` can also both be factors.
`y`	a numeric vector; ignored if `x` is a matrix or a formula. If `x` is a factor, `y` should be a factor of the same length.
`correct`	a logical indicating whether to apply continuity correction when computing the test statistic for 2 by 2 tables: one half is subtracted from all $\|O - E\|$ differences; however, the correction will not be bigger than the differences themselves. No correction is done if `simulate.p.value = TRUE`.
`p`	a vector of probabilities of the same length of `x`. An error is given if any entry of `p` is negative.
`rescale.p`	a logical scalar; if TRUE then `p` is rescaled (if necessary) to sum to 1. If `rescale.p` is FALSE, and `p` does not sum to 1, an error is given.
`simulate.p.value`	a logical indicating whether to compute p-values by Monte Carlo simulation.
`B`	an integer specifying the number of replicates used in the Monte Carlo test.

Details

If x is a matrix with one row or column, or if x is a vector and y is not given, then a _goodness-of-fit test_ is performed (x is treated as a one-dimensional contingency table). The entries of x must be non-negative integers. In this case, the hypothesis tested is whether the population probabilities equal those in p, or are all equal if p is not given.

If x is a matrix with at least two rows and columns, it is taken as a two-dimensional contingency table: the entries of x must be non-negative integers. Otherwise, x and y must be vectors or factors of the same length; cases with missing values are removed, the objects are coerced to factors, and the contingency table is computed from these. Then Pearson's chi-squared test is performed of the null hypothesis that the joint distribution of the cell counts in a 2-dimensional contingency table is the product of the row and column marginals.

If simulate.p.value is FALSE, the p-value is computed from the asymptotic chi-squared distribution of the test statistic; continuity correction is only used in the 2-by-2 case (if correct is TRUE, the default). Otherwise the p-value is computed for a Monte Carlo test (Hope, 1968) with B replicates.

In the contingency table case simulation is done by random sampling from the set of all contingency tables with given marginals, and works only if the marginals are strictly positive. Continuity correction is never used, and the statistic is quoted without it. Note that this is not the usual sampling situation assumed for the chi-squared test but rather that for Fisher's exact test.

In the goodness-of-fit case simulation is done by random sampling from the discrete distribution specified by p, each sample being of size n = sum(x). This simulation is done in R and may be slow.

Value

A list with class "htest" containing the following components: statistic: the value the chi-squared test statistic.

parameter: the degrees of freedom of the approximate chi-squared distribution of the test statistic, NA if the p-value is computed by Monte Carlo simulation.

p.value: the p-value for the test.

method: a character string indicating the type of test performed, and whether Monte Carlo simulation or continuity correction was used.

data.name: a character string giving the name(s) of the data.

observed: the observed counts.

expected: the expected counts under the null hypothesis.

residuals: the Pearson residuals, ‘(observed - expected) / sqrt(expected)’.

stdres: standardized residuals, (observed - expected) / sqrt(V), where V is the residual cell variance (Agresti, 2007, section 2.4.5 for the case where x is a matrix, ‘n * p * (1 - p)’ otherwise).

Source

The code for Monte Carlo simulation is a C translation of the Fortran algorithm of Patefield (1981).

References

Hope, A. C. A. (1968) A simplified Monte Carlo significance test procedure. _J. Roy, Statist. Soc. B_ *30*, 582-598.

Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with given row and column totals. _Applied Statistics_ *30*, 91-97.

Agresti, A. (2007) _An Introduction to Categorical Data Analysis, 2nd ed._, New York: John Wiley & Sons. Page 38.

Examples

## Not run: 
## From Agresti(2007) p.39
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(M) <- list(gender = c("F", "M"),
                    party = c("Democrat","Independent", "Republican"))
(Xsq <- chisq.test(M))  # Prints test summary
Xsq$observed   # observed counts (same as M)
Xsq$expected   # expected counts under the null
Xsq$residuals  # Pearson residuals
Xsq$stdres     # standardized residuals


## Effect of simulating p-values
x <- matrix(c(12, 5, 7, 7), ncol = 2)
chisq.test(x)$p.value           # 0.4233
chisq.test(x, simulate.p.value = TRUE, B = 10000)$p.value
                                # around 0.29!

## Testing for population probabilities
## Case A. Tabulated data
x <- c(A = 20, B = 15, C = 25)
chisq.test(x)
chisq.test(as.table(x))             # the same
x <- c(89,37,30,28,2)
p <- c(40,20,20,15,5)
try(
chisq.test(x, p = p)                # gives an error
)
chisq.test(x, p = p, rescale.p = TRUE)
                                # works
p <- c(0.40,0.20,0.20,0.19,0.01)
                                # Expected count in category 5
                                # is 1.86 < 5 ==> chi square approx.
chisq.test(x, p = p)            #               maybe doubtful, but is ok!
chisq.test(x, p = p, simulate.p.value = TRUE)

## Case B. Raw data
x <- trunc(5 * runif(100))
chisq.test(table(x))            # NOT 'chisq.test(x)'!

###

## End(Not run)
## Not run: 
## From Agresti(2007) p.39
M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(M) <- list(gender = c("F", "M"),
                    party = c("Democrat","Independent", "Republican"))
(Xsq <- chisq.test(M))  # Prints test summary
Xsq$observed   # observed counts (same as M)
Xsq$expected   # expected counts under the null
Xsq$residuals  # Pearson residuals
Xsq$stdres     # standardized residuals


## Effect of simulating p-values
x <- matrix(c(12, 5, 7, 7), ncol = 2)
chisq.test(x)$p.value           # 0.4233
chisq.test(x, simulate.p.value = TRUE, B = 10000)$p.value
                                # around 0.29!

## Testing for population probabilities
## Case A. Tabulated data
x <- c(A = 20, B = 15, C = 25)
chisq.test(x)
chisq.test(as.table(x))             # the same
x <- c(89,37,30,28,2)
p <- c(40,20,20,15,5)
try(
chisq.test(x, p = p)                # gives an error
)
chisq.test(x, p = p, rescale.p = TRUE)
                                # works
p <- c(0.40,0.20,0.20,0.19,0.01)
                                # Expected count in category 5
                                # is 1.86 < 5 ==> chi square approx.
chisq.test(x, p = p)            #               maybe doubtful, but is ok!
chisq.test(x, p = p, simulate.p.value = TRUE)

## Case B. Raw data
x <- trunc(5 * runif(100))
chisq.test(table(x))            # NOT 'chisq.test(x)'!

###

## End(Not run)

Create an HTML table widget using the DataTables library

Description

This function creates an HTML widget to display rectangular data (a matrix or data frame) using the JavaScript library DataTables, with a method for desctable objects.

Usage

datatable(data, ...)

## Default S3 method:
datatable(
  data,
  options = list(),
  class = "display",
  callback = DT::JS("return table;"),
  caption = NULL,
  filter = c("none", "bottom", "top"),
  escape = TRUE,
  style = "default",
  width = NULL,
  height = NULL,
  elementId = NULL,
  fillContainer = getOption("DT.fillContainer", NULL),
  autoHideNavigation = getOption("DT.autoHideNavigation", NULL),
  selection = c("multiple", "single", "none"),
  extensions = list(),
  plugins = NULL,
  ...
)

## S3 method for class 'desctable'
datatable(
  data,
  options = list(paging = F, info = F, search = list(), dom = "Brtip", fixedColumns =
    T, fixedHeader = T, buttons = c("copy", "excel")),
  class = "display",
  callback = DT::JS("return table;"),
  caption = NULL,
  filter = c("none", "bottom", "top"),
  escape = FALSE,
  style = "default",
  width = NULL,
  height = NULL,
  elementId = NULL,
  fillContainer = getOption("DT.fillContainer", NULL),
  autoHideNavigation = getOption("DT.autoHideNavigation", NULL),
  selection = c("multiple", "single", "none"),
  extensions = c("FixedHeader", "FixedColumns", "Buttons"),
  plugins = NULL,
  rownames = F,
  digits = 2,
  ...
)
datatable(data, ...)

## Default S3 method:
datatable(
  data,
  options = list(),
  class = "display",
  callback = DT::JS("return table;"),
  caption = NULL,
  filter = c("none", "bottom", "top"),
  escape = TRUE,
  style = "default",
  width = NULL,
  height = NULL,
  elementId = NULL,
  fillContainer = getOption("DT.fillContainer", NULL),
  autoHideNavigation = getOption("DT.autoHideNavigation", NULL),
  selection = c("multiple", "single", "none"),
  extensions = list(),
  plugins = NULL,
  ...
)

## S3 method for class 'desctable'
datatable(
  data,
  options = list(paging = F, info = F, search = list(), dom = "Brtip", fixedColumns =
    T, fixedHeader = T, buttons = c("copy", "excel")),
  class = "display",
  callback = DT::JS("return table;"),
  caption = NULL,
  filter = c("none", "bottom", "top"),
  escape = FALSE,
  style = "default",
  width = NULL,
  height = NULL,
  elementId = NULL,
  fillContainer = getOption("DT.fillContainer", NULL),
  autoHideNavigation = getOption("DT.autoHideNavigation", NULL),
  selection = c("multiple", "single", "none"),
  extensions = c("FixedHeader", "FixedColumns", "Buttons"),
  plugins = NULL,
  rownames = F,
  digits = 2,
  ...
)

Arguments

`data`	a data object (either a matrix or a data frame)
`...`	arguments passed to `format`.
`options`	a list of initialization options (see https://datatables.net/reference/option/); the character options wrapped in `JS()` will be treated as literal JavaScript code instead of normal character strings; you can also set options globally via `options(DT.options = list(...))`, and global options will be merged into this `options` argument if set
`class`	the CSS class(es) of the table; see https://datatables.net/manual/styling/classes
`callback`	the body of a JavaScript callback function with the argument `table` to be applied to the DataTables instance (i.e. `table`)
`caption`	the table caption; a character vector or a tag object generated from `htmltools::tags$caption()`
`filter`	whether/where to use column filters; `none`: no filters; `bottom/top`: put column filters at the bottom/top of the table; range sliders are used to filter numeric/date/time columns, select lists are used for factor columns, and text input boxes are used for character columns; if you want more control over the styles of filters, you can provide a list to this argument of the form `list(position = 'top', clear = TRUE, plain = FALSE)`, where `clear` indicates whether you want the clear buttons in the input boxes, and `plain` means if you want to use Bootstrap form styles or plain text input styles for the text input boxes
`escape`	whether to escape HTML entities in the table: `TRUE` means to escape the whole table, and `FALSE` means not to escape it; alternatively, you can specify numeric column indices or column names to indicate which columns to escape, e.g. `1:5` (the first 5 columns), `c(1, 3, 4)`, or `c(-1, -3)` (all columns except the first and third), or `c('Species', 'Sepal.Length')`; since the row names take the first column to display, you should add the numeric column indices by one when using `rownames`
`style`	either `'auto'`, `'default'`, `'bootstrap'`, or `'bootstrap4'`. If `'auto'`, and a bslib theme is currently active, then bootstrap styling is used in a way that "just works" for the active theme. Otherwise, DataTables `'default'` styling is used. If set explicitly to `'bootstrap'` or `'bootstrap4'`, one must take care to ensure Bootstrap's HTML dependencies (as well as Bootswatch themes, if desired) are included on the page. Note, when set explicitly, it's the user's responsibility to ensure that only one unique 'style' value is used on the same page, if multiple DT tables exist, as different styling resources may conflict with each other.
`width`	Width/Height in pixels (optional, defaults to automatic sizing)
`height`	Width/Height in pixels (optional, defaults to automatic sizing)
`elementId`	An id for the widget (a random string by default).
`fillContainer`	`TRUE` to configure the table to automatically fill it's containing element. If the table can't fit fully into it's container then vertical and/or horizontal scrolling of the table cells will occur.
`autoHideNavigation`	`TRUE` to automatically hide navigational UI (only display the table body) when the number of total records is less than the page size. Note, it only works on the client-side processing mode and the 'pageLength' option should be provided explicitly.
`selection`	the row/column selection mode (single or multiple selection or disable selection) when a table widget is rendered in a Shiny app; alternatively, you can use a list of the form `list(mode = 'multiple', selected = c(1, 3, 8), target = 'row', selectable = c(-2, -3))` to pre-select rows and control the selectable range; the element `target` in the list can be `'column'` to enable column selection, or `'row+column'` to make it possible to select both rows and columns (click on the footer to select columns), or `'cell'` to select cells. See details section for more info.
`extensions`	a character vector of the names of the DataTables extensions (https://datatables.net/extensions/index)
`plugins`	a character vector of the names of DataTables plug-ins (https://rstudio.github.io/DT/plugins.html). Note that only those plugins supported by the `DT` package can be used here. You can see the available plugins by calling `DT:::available_plugins()`
`rownames`	`TRUE` (show row names) or `FALSE` (hide row names) or a character vector of row names; by default, the row names are displayed in the first column of the table if exist (not `NULL`)
`digits`	the desired number of digits after the decimal point (`format = "f"`) or significant digits (`format = "g"`, `= "e"` or `= "fg"`). Default: 2 for integer, 4 for real numbers. If less than 0, the C default of 6 digits is used. If specified as more than 50, 50 will be used with a warning unless `format = "f"` where it is limited to typically 324. (Not more than 15–21 digits need be accurate, depending on the OS and compiler used. This limit is just a precaution against segfaults in the underlying C runtime.)

Note

You are recommended to escape the table content for security reasons (e.g. XSS attacks) when using this function in Shiny or any other dynamic web applications.

References

See https://rstudio.github.io/DT/ for the full documentation.

Examples

library(DT)

# see the package vignette for examples and the link to website
vignette('DT', package = 'DT')

# some boring edge cases for testing purposes
m = matrix(nrow = 0, ncol = 5, dimnames = list(NULL, letters[1:5]))
datatable(m)  # zero rows
datatable(as.data.frame(m))

m = matrix(1, dimnames = list(NULL, 'a'))
datatable(m)  # one row and one column
datatable(as.data.frame(m))

m = data.frame(a = 1, b = 2, c = 3)
datatable(m)
datatable(as.matrix(m))

# dates
datatable(data.frame(
  date = seq(as.Date("2015-01-01"), by = "day", length.out = 5), x = 1:5
))
datatable(data.frame(x = Sys.Date()))
datatable(data.frame(x = Sys.time()))

###
library(DT)

# see the package vignette for examples and the link to website
vignette('DT', package = 'DT')

# some boring edge cases for testing purposes
m = matrix(nrow = 0, ncol = 5, dimnames = list(NULL, letters[1:5]))
datatable(m)  # zero rows
datatable(as.data.frame(m))

m = matrix(1, dimnames = list(NULL, 'a'))
datatable(m)  # one row and one column
datatable(as.data.frame(m))

m = data.frame(a = 1, b = 2, c = 3)
datatable(m)
datatable(as.matrix(m))

# dates
datatable(data.frame(
  date = seq(as.Date("2015-01-01"), by = "day", length.out = 5), x = 1:5
))
datatable(data.frame(x = Sys.Date()))
datatable(data.frame(x = Sys.time()))

###

desc_output

Description

Output a desctable to the desired target format

Usage

desc_output(desctable, target = c("df", "pander", "DT"), digits = 2, ...)
desc_output(desctable, target = c("df", "pander", "DT"), digits = 2, ...)

Arguments

`desctable`	The desctable to output
`target`	The desired target. One of "df", "pander", or "DT".
`digits`	The number of digits to display. The p values will be simplified under 1E-digits
`...`	Other arguments to pass to `data.frame`, `pander::pander`, or `DT::datatable`

Details

Output a simple or grouped desctable to a different formats. Currently available formats are

data.frame ("df")
pander ("pander")
datatable ("DT")

All numerical values will be rounded to the digits argument. If statistical tests are presents, p values below 1E-digits will be replaced with "< 1E-digits" (eg. "< 0.01" for values below 0.01 when digits = 2)

Value

The output object (or corresponding side effect)

Generate a statistics table

Description

Generate a statistics table with the chosen statistical functions, nested if called with a grouped dataframe.

Usage

desc_table(data, ..., .auto, .labels)

## Default S3 method:
desc_table(data, ..., .auto, .labels)

## S3 method for class 'data.frame'
desc_table(data, ..., .labels = NULL, .auto = stats_auto)

## S3 method for class 'grouped_df'
desc_table(data, ..., .auto = stats_auto, .labels = NULL)
desc_table(data, ..., .auto, .labels)

## Default S3 method:
desc_table(data, ..., .auto, .labels)

## S3 method for class 'data.frame'
desc_table(data, ..., .labels = NULL, .auto = stats_auto)

## S3 method for class 'grouped_df'
desc_table(data, ..., .auto = stats_auto, .labels = NULL)

Arguments

`data`	The dataframe to analyze
`...`	A list of named statistics to apply to each element of the dataframe, or a function returning a list of named statistics
`.auto`	A function to automatically determine appropriate statistics
`.labels`	A named character vector of variable labels

Value

A simple or grouped descriptive table

Stats

The statistical functions to use in the table are passed as additional arguments. If the argument is named (eg. N = length) the name will be used as the column title instead of the function name (here, N instead of length).

Any R function can be a statistical function, as long as it returns only one value when applied to a vector, or as many values as there are levels in a factor, plus one.

Users can also use purrr::map-like formulas as quick anonymous functions (eg. Q1 = ~ quantile(., .25) to get the first quantile in a column named Q1)

If no statistical function is given to desc_table, the .auto argument is used to provide a function that automatically determines the most appropriate statistical functions to use based on the contents of the table.

Labels

.labels is a named character vector to provide "pretty" labels to variables.

If given, the variable names for which there is a label will be replaced by their corresponding label.

Not all variables need to have a label, and labels for non-existing variables are ignored.

labels must be given in the form c(unquoted_variable_name = "label")

Output

The output is either a dataframe in the case of a simple descriptive table, or nested dataframes in the case of a comparative table.

Examples

iris %>%
  desc_table()

# Does the same as stats_auto here
iris %>%
  desc_table("N"      = length,
             "Min"    = min,
             "Q1"     = ~quantile(., .25),
             "Med"    = median,
             "Mean"   = mean,
             "Q3"     = ~quantile(., .75),
             "Max"    = max,
             "sd"     = sd,
             "IQR"    = IQR)

# With grouping on a factor
iris %>%
  group_by(Species) %>%
  desc_table(.auto = stats_auto)
iris %>%
  desc_table()

# Does the same as stats_auto here
iris %>%
  desc_table("N"      = length,
             "Min"    = min,
             "Q1"     = ~quantile(., .25),
             "Med"    = median,
             "Mean"   = mean,
             "Q3"     = ~quantile(., .75),
             "Max"    = max,
             "sd"     = sd,
             "IQR"    = IQR)

# With grouping on a factor
iris %>%
  group_by(Species) %>%
  desc_table(.auto = stats_auto)

Add tests to a desc_table

Description

Add test statistics to a grouped desc_table, with the tests specified as variable = test.

Usage

desc_tests(desctable, .auto = tests_auto, .default = NULL, ...)
desc_tests(desctable, .auto = tests_auto, .default = NULL, ...)

Arguments

`desctable`	A desc_table
`.auto`	A function to automatically determine the appropriate tests
`.default`	A default fallback test
`...`	A list of statistical tests associated to variable names

Value

A desc_table with tests

Tests

The statistical test functions to use in the table are passed as additional named arguments. Tests must be preceded by a formula tilde (~). name = ~test will apply test test to variable name.

Any R test function can be used, as long as it returns an object containing a p.value element, which is the case for most tests returning an object of class htest.

Users can also use purrr::map-like formulas as quick anonymous functions (eg. ~ t.test(., var.equal = T) to compute a t test without the Welch correction.

Examples

iris %>%
  group_by(Species) %>%
  desc_table() %>%
  desc_tests(Sepal.Length = ~kruskal.test,
             Sepal.Width  = ~oneway.test,
             Petal.Length = ~oneway.test(., var.equal = T),
             Petal.Length = ~oneway.test(., var.equal = F))
iris %>%
  group_by(Species) %>%
  desc_table() %>%
  desc_tests(Sepal.Length = ~kruskal.test,
             Sepal.Width  = ~oneway.test,
             Petal.Length = ~oneway.test(., var.equal = T),
             Petal.Length = ~oneway.test(., var.equal = F))

Generate a statistics table

Description

Generate a statistics table with the chosen statistical functions, and tests if given a "grouped" dataframe.

Usage

desctable(data, stats, tests, labels)

## Default S3 method:
desctable(data, stats = stats_auto, tests, labels = NULL)

## S3 method for class 'grouped_df'
desctable(data, stats = stats_auto, tests = tests_auto, labels = NULL)
desctable(data, stats, tests, labels)

## Default S3 method:
desctable(data, stats = stats_auto, tests, labels = NULL)

## S3 method for class 'grouped_df'
desctable(data, stats = stats_auto, tests = tests_auto, labels = NULL)

Arguments

`data`	The dataframe to analyze
`stats`	A list of named statistics to apply to each element of the dataframe, or a function returning a list of named statistics
`tests`	A list of statistical tests to use when calling desctable with a grouped_df
`labels`	A named character vector of labels to use instead of variable names

Value

A desctable object, which prints to a table of statistics for all variables

Labels

labels is an option named character vector used to make the table prettier.

If given, the variable names for which there is a label will be replaced by their corresponding label.

Not all variables need to have a label, and labels for non-existing variables are ignored.

labels must be given in the form c(unquoted_variable_name = "label")

Stats

The stats can be a function which takes a dataframe and returns a list of statistical functions to use.

stats can also be a named list of statistical functions, or purrr::map like formulas.

The names will be used as column names in the resulting table. If an element of the list is a function, it will be used as-is for the stats.

Tests

The tests can be a function which takes a variable and a grouping variable, and returns an appropriate statistical test to use in that case.

tests can also be a named list of statistical test functions, associating the name of a variable in the data and a test to use specifically for that variable.

That test name must be expressed as a single-term formula (e.g. ~t.test), or a purrr::map like formula (e.g. ~t.test(., var.equal = T)). You don't have to specify tests for all the variables: a default test for all other variables can be defined with the name .default, and an automatic test can be defined with the name .auto.

If data is a grouped dataframe (using group_by), subtables are created and statistic tests are performed over each sub-group.

Output

The output is a desctable object, which is a list of named dataframes that can be further manipulated. Methods for printing, using in pander and DT are present. Printing reduces the object to a dataframe.

Examples

iris %>%
  desctable()

# Does the same as stats_auto here
iris %>%
  desctable(stats = list("N"      = length,
                         "Mean"   = ~ if (is.normal(.)) mean(.),
                         "sd"     = ~ if (is.normal(.)) sd(.),
                         "Med"    = stats::median,
                         "IQR"    = ~ if(!is.factor(.)) IQR(.)))

# With labels
mtcars %>% desctable(labels = c(hp  = "Horse Power",
                                cyl = "Cylinders",
                                mpg = "Miles per gallon"))

# With grouping on a factor
iris %>%
  group_by(Species) %>%
  desctable(stats = stats_default)

# With nested grouping, on arbitrary variables
mtcars %>%
  group_by(vs, cyl) %>%
  desctable()

# With grouping on a condition, and choice of tests
iris %>%
  group_by(Petal.Length > 5) %>%
  desctable(tests = list(.auto = tests_auto, Species = ~chisq.test))
iris %>%
  desctable()

# Does the same as stats_auto here
iris %>%
  desctable(stats = list("N"      = length,
                         "Mean"   = ~ if (is.normal(.)) mean(.),
                         "sd"     = ~ if (is.normal(.)) sd(.),
                         "Med"    = stats::median,
                         "IQR"    = ~ if(!is.factor(.)) IQR(.)))

# With labels
mtcars %>% desctable(labels = c(hp  = "Horse Power",
                                cyl = "Cylinders",
                                mpg = "Miles per gallon"))

# With grouping on a factor
iris %>%
  group_by(Species) %>%
  desctable(stats = stats_default)

# With nested grouping, on arbitrary variables
mtcars %>%
  group_by(vs, cyl) %>%
  desctable()

# With grouping on a condition, and choice of tests
iris %>%
  group_by(Petal.Length > 5) %>%
  desctable(tests = list(.auto = tests_auto, Species = ~chisq.test))

Fisher's Exact Test for Count Data

Description

Performs Fisher's exact test for testing the null of independence of rows and columns in a contingency table with fixed marginals, or with a formula expression.

Usage

fisher.test(
  x,
  y,
  workspace,
  hybrid,
  control,
  or,
  alternative,
  conf.int,
  conf.level,
  simulate.p.value,
  B
)

## Default S3 method:
fisher.test(x, ...)

## S3 method for class 'formula'
fisher.test(
  x,
  y = NULL,
  workspace = 200000,
  hybrid = F,
  control = list(),
  or = 1,
  alternative = "two.sided",
  conf.int = T,
  conf.level = 0.95,
  simulate.p.value = F,
  B = 2000
)
fisher.test(
  x,
  y,
  workspace,
  hybrid,
  control,
  or,
  alternative,
  conf.int,
  conf.level,
  simulate.p.value,
  B
)

## Default S3 method:
fisher.test(x, ...)

## S3 method for class 'formula'
fisher.test(
  x,
  y = NULL,
  workspace = 200000,
  hybrid = F,
  control = list(),
  or = 1,
  alternative = "two.sided",
  conf.int = T,
  conf.level = 0.95,
  simulate.p.value = F,
  B = 2000
)

Arguments

`x`	either a two-dimensional contingency table in matrix form, a factor object, or a formula of the form `lhs ~ rhs` where `lhs` and `rhs` are factors.
`y`	a factor object; ignored if `x` is a matrix or a formula.
`workspace`	an integer specifying the size of the workspace used in the network algorithm. In units of 4 bytes. Only used for non-simulated p-values larger than $2 \times 2$ tables. Since R version 3.5.0, this also increases the internal stack size which allows larger problems to be solved, however sometimes needing hours. In such cases, `simulate.p.values=TRUE` may be more reasonable.
`hybrid`	a logical. Only used for larger than $2 \times 2$ tables, in which cases it indicates whether the exact probabilities (default) or a hybrid approximation thereof should be computed.
`control`	a list with named components for low level algorithm control. At present the only one used is `"mult"`, a positive integer $\ge 2$ with default 30 used only for larger than $2 \times 2$ tables. This says how many times as much space should be allocated to paths as to keys: see file ‘fexact.c’ in the sources of this package.
`or`	the hypothesized odds ratio. Only used in the $2 \times 2$ case.
`alternative`	indicates the alternative hypothesis and must be one of `"two.sided"`, `"greater"` or `"less"`. You can specify just the initial letter. Only used in the $2 \times 2$ case.
`conf.int`	logical indicating if a confidence interval for the odds ratio in a $2 \times 2$ table should be computed (and returned).
`conf.level`	confidence level for the returned confidence interval. Only used in the $2 \times 2$ case and if `conf.int = TRUE`.
`simulate.p.value`	a logical indicating whether to compute p-values by Monte Carlo simulation, in larger than $2 \times 2$ tables.
`B`	an integer specifying the number of replicates used in the Monte Carlo test.
`...`	additional params to feed to original fisher.test

Details

If x is a matrix, it is taken as a two-dimensional contingency table, and hence its entries should be nonnegative integers. Otherwise, both x and y must be vectors of the same length. Incomplete cases are removed, the vectors are coerced into factor objects, and the contingency table is computed from these.

For 2 by 2 cases, p-values are obtained directly using the (central or non-central) hypergeometric distribution. Otherwise, computations are based on a C version of the FORTRAN subroutine FEXACT which implements the network developed by Mehta and Patel (1986) and improved by Clarkson, Fan and Joe (1993). The FORTRAN code can be obtained from http://www.netlib.org/toms/643. Note this fails (with an error message) when the entries of the table are too large. (It transposes the table if necessary so it has no more rows than columns. One constraint is that the product of the row marginals be less than 2^31 - 1.)

For 2 by 2 tables, the null of conditional independence is equivalent to the hypothesis that the odds ratio equals one. Exact inference can be based on observing that in general, given all marginal totals fixed, the first element of the contingency table has a non-central hypergeometric distribution with non-centrality parameter given by the odds ratio (Fisher, 1935). The alternative for a one-sided test is based on the odds ratio, so alternative = "greater" is a test of the odds ratio being bigger than or.

Two-sided tests are based on the probabilities of the tables, and take as more extreme all tables with probabilities less than or equal to that of the observed table, the p-value being the sum of such probabilities.

For larger than 2 by 2 tables and hybrid = TRUE, asymptotic chi-squared probabilities are only used if the ‘Cochran conditions’ are satisfied, that is if no cell has count zero, and more than 80 exact calculation is used.

Simulation is done conditional on the row and column marginals, and works only if the marginals are strictly positive. (A C translation of the algorithm of Patefield (1981) is used.)

Value

A list with class "htest" containing the following components:

p.value: the p-value of the test.

conf.int: a confidence interval for the odds ratio. Only present in the 2 by 2 case and if argument conf.int = TRUE.

estimate: an estimate of the odds ratio. Note that the _conditional_ Maximum Likelihood Estimate (MLE) rather than the unconditional MLE (the sample odds ratio) is used. Only present in the 2 by 2 case.

null.value: the odds ratio under the null, or. Only present in the 2 by 2 case.

alternative: a character string describing the alternative hypothesis.

method: the character string "Fisher's Exact Test for Count Data".

data.name: a character string giving the names of the data.

References

Agresti, A. (1990) _Categorical data analysis_. New York: Wiley. Pages 59-66.

Agresti, A. (2002) _Categorical data analysis_. Second edition. New York: Wiley. Pages 91-101.

Fisher, R. A. (1935) The logic of inductive inference. _Journal of the Royal Statistical Society Series A_ *98*, 39-54.

Fisher, R. A. (1962) Confidence limits for a cross-product ratio. _Australian Journal of Statistics_ *4*, 41.

Fisher, R. A. (1970) _Statistical Methods for Research Workers._ Oliver & Boyd.

Mehta, C. R. and Patel, N. R. (1986) Algorithm 643. FEXACT: A Fortran subroutine for Fisher's exact test on unordered r*c contingency tables. _ACM Transactions on Mathematical Software_, *12*, 154-161.

Clarkson, D. B., Fan, Y. and Joe, H. (1993) A Remark on Algorithm 643: FEXACT: An Algorithm for Performing Fisher's Exact Test in r x c Contingency Tables. _ACM Transactions on Mathematical Software_, *19*, 484-488.

Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with given row and column totals. _Applied Statistics_ *30*, 91-97.

Examples

## Not run: 
## Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinker
## A British woman claimed to be able to distinguish whether milk or
##  tea was added to the cup first.  To test, she was given 8 cups of
##  tea, in four of which milk was added first.  The null hypothesis
##  is that there is no association between the true order of pouring
##  and the woman's guess, the alternative that there is a positive
##  association (that the odds ratio is greater than 1).
TeaTasting <-
matrix(c(3, 1, 1, 3),
       nrow = 2,
       dimnames = list(Guess = c("Milk", "Tea"),
                       Truth = c("Milk", "Tea")))
fisher.test(TeaTasting, alternative = "greater")
## => p = 0.2429, association could not be established

## Fisher (1962, 1970), Criminal convictions of like-sex twins
Convictions <-
matrix(c(2, 10, 15, 3),
       nrow = 2,
       dimnames =
       list(c("Dizygotic", "Monozygotic"),
            c("Convicted", "Not convicted")))
Convictions
fisher.test(Convictions, alternative = "less")
fisher.test(Convictions, conf.int = FALSE)
fisher.test(Convictions, conf.level = 0.95)$conf.int
fisher.test(Convictions, conf.level = 0.99)$conf.int

## A r x c table  Agresti (2002, p. 57) Job Satisfaction
Job <- matrix(c(1,2,1,0, 3,3,6,1, 10,10,14,9, 6,7,12,11), 4, 4,
dimnames = list(income = c("< 15k", "15-25k", "25-40k", "> 40k"),
                satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS")))
fisher.test(Job)
fisher.test(Job, simulate.p.value = TRUE, B = 1e5)

###

## End(Not run)
## Not run: 
## Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinker
## A British woman claimed to be able to distinguish whether milk or
##  tea was added to the cup first.  To test, she was given 8 cups of
##  tea, in four of which milk was added first.  The null hypothesis
##  is that there is no association between the true order of pouring
##  and the woman's guess, the alternative that there is a positive
##  association (that the odds ratio is greater than 1).
TeaTasting <-
matrix(c(3, 1, 1, 3),
       nrow = 2,
       dimnames = list(Guess = c("Milk", "Tea"),
                       Truth = c("Milk", "Tea")))
fisher.test(TeaTasting, alternative = "greater")
## => p = 0.2429, association could not be established

## Fisher (1962, 1970), Criminal convictions of like-sex twins
Convictions <-
matrix(c(2, 10, 15, 3),
       nrow = 2,
       dimnames =
       list(c("Dizygotic", "Monozygotic"),
            c("Convicted", "Not convicted")))
Convictions
fisher.test(Convictions, alternative = "less")
fisher.test(Convictions, conf.int = FALSE)
fisher.test(Convictions, conf.level = 0.95)$conf.int
fisher.test(Convictions, conf.level = 0.99)$conf.int

## A r x c table  Agresti (2002, p. 57) Job Satisfaction
Job <- matrix(c(1,2,1,0, 3,3,6,1, 10,10,14,9, 6,7,12,11), 4, 4,
dimnames = list(income = c("< 15k", "15-25k", "25-40k", "> 40k"),
                satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS")))
fisher.test(Job)
fisher.test(Job, simulate.p.value = TRUE, B = 1e5)

###

## End(Not run)

Return the inter-quartile range

Description

Safe version of IQR for statify

Usage

IQR(x)
IQR(x)

Arguments

x

A vector

Value

The IQR

Test if distribution is normal

Description

Test if distribution is normal. The condition for normality is length > 30 and non-significant Shapiro-Wilks test with p > .1

Usage

is.normal(x)
is.normal(x)

Arguments

`x`	A numerical vector

Value

A boolean

No test

Description

An empty test

Usage

no.test(formula)
no.test(formula)

Arguments

formula

A formula

Pander method for desctable

Description

Pander method to output a desctable

Usage

## S3 method for class 'desctable'
pander(
  x = NULL,
  digits = 2,
  justify = "left",
  missing = "",
  keep.line.breaks = T,
  split.tables = Inf,
  emphasize.rownames = F,
  ...
)
## S3 method for class 'desctable'
pander(
  x = NULL,
  digits = 2,
  justify = "left",
  missing = "",
  keep.line.breaks = T,
  split.tables = Inf,
  emphasize.rownames = F,
  ...
)

Arguments

`x`	A desctable
`digits`	passed to `format`. Can be a vector specifying values for each column (has to be the same length as number of columns).
`justify`	defines alignment in cells passed to `format`. Can be `left`, `right` or `centre`, which latter can be also spelled as `center`. Defaults to `centre`. Can be abbreviated to a string consisting of the letters `l`, `c` and `r` (e.g. 'lcr' instead of c('left', 'centre', 'right').
`missing`	string to replace missing values
`keep.line.breaks`	(default: `FALSE`) if to keep or remove line breaks from cells in a table
`split.tables`	where to split wide tables to separate tables. The default value (`80`) suggests the conventional number of characters used in a line, feel free to change (e.g. to `Inf` to disable this feature) if you are not using a VT100 terminal any more :)
`emphasize.rownames`	boolean (default: `TRUE`) if row names should be highlighted
`...`	unsupported extra arguments directly placed into `/dev/null`

Details

Uses pandoc.table, with some default parameters (digits = 2, justify = "left", missing = "", keep.line.breaks = T, split.tables = Inf, and emphasize.rownames = F), that you can override if needed.

Return the percentages for the levels of a factor

Description

Return a compatible vector of length nlevels(x) + 1 to print the percentages of each level of a factor

Usage

percent(x)
percent(x)

Arguments

x

A factor

Value

A nlevels(x) + 1 length vector of percentages

Print method for desctable

Description

Print method for desctable

Usage

## S3 method for class 'desctable'
print(x, ...)
## S3 method for class 'desctable'
print(x, ...)

Arguments

`x`	A desctable
`...`	Additional print parameters

Value

A flat dataframe

Function to create a list of statistics to use in desctable

Description

This function takes a dataframe as argument and returns a list of statistcs in the form accepted by desctable.

Usage

stats_auto(data)
stats_auto(data)

Arguments

data

The dataframe to apply the statistic to

Details

You can define your own automatic function, as long as it takes a dataframe as argument and returns a list of functions, or formulas defining conditions to use a stat function.

Value

A list of statistics to use, assessed from the content of the dataframe

Define a list of default statistics

Description

Define a list of default statistics

Usage

stats_default(data)

stats_normal(data)

stats_nonnormal(data)
stats_default(data)

stats_normal(data)

stats_nonnormal(data)

Arguments

data

A dataframe

Value

A list of statistical functions

Function to choose a statistical test

Description

This function takes a variable and a grouping variable as arguments, and returns a statistcal test to use, expressed as a single-term formula.

Usage

tests_auto(var, grp)
tests_auto(var, grp)

Arguments

`var`	The variable to test
`grp`	The variable for the groups

Details

This function uses appropriate non-parametric tests depending on the number of levels (wilcoxon.test for two levels and kruskal.test for more), and fisher.test with fallback on chisq.test on error for factors.

Value

A statistical test function

Package 'desctable'

Help Index

Wrapper for oneway.test(var.equal = T)

Description

Usage

Arguments

See Also

As.data.frame method for desctable

Description

Usage

Arguments

Value

Pearson's Chi-squared Test for Count Data

Description

Usage

Arguments

Details

Value

Source

References

See Also

Examples

Create an HTML table widget using the DataTables library

Description

Usage

Arguments

Note

References

Examples

desc_output

Description

Usage

Arguments

Details

Value

See Also

Generate a statistics table

Description

Usage

Arguments

Value

Stats

Labels

Output

See Also

Examples

Add tests to a desc_table

Description

Usage

Arguments

Value

Tests

See Also

Examples

Generate a statistics table

Description

Usage

Arguments

Value

Labels

Stats

Tests

Output

See Also

Examples

Fisher's Exact Test for Count Data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Return the inter-quartile range

Description

Usage

Arguments

Value

Test if distribution is normal