Title: | Produce Descriptive and Comparative Tables Easily |
---|---|
Description: | Easily create descriptive and comparative tables. It makes use and integrates directly with the tidyverse family of packages, and pipes. Tables are produced as (nested) dataframes for easy manipulation. |
Authors: | Maxime Wack [aut, cre], Adrien Boukobza [aut], Yihui Xie [ctb] |
Maintainer: | Maxime Wack <[email protected]> |
License: | GPL-3 |
Version: | 0.3.0 |
Built: | 2025-03-10 03:49:03 UTC |
Source: | https://github.com/desctable/desctable |
Wrapper for oneway.test(var.equal = T)
ANOVA(formula)
ANOVA(formula)
formula |
An anova formula ( |
As.data.frame method for desctable
## S3 method for class 'desctable' as.data.frame(x, ...)
## S3 method for class 'desctable' as.data.frame(x, ...)
x |
A desctable |
... |
Additional as.data.frame parameters |
A flat dataframe
chisq.test
performs chi-squared contingency table tests and goodness-of-fit tests, with an added method for formulas.
chisq.test(x, y, correct, p, rescale.p, simulate.p.value, B) ## Default S3 method: chisq.test( x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), rescale.p = FALSE, simulate.p.value = FALSE, B = 2000 ) ## S3 method for class 'formula' chisq.test( x, y = NULL, correct = T, p = rep(1/length(x), length(x)), rescale.p = F, simulate.p.value = F, B = 2000 )
chisq.test(x, y, correct, p, rescale.p, simulate.p.value, B) ## Default S3 method: chisq.test( x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), rescale.p = FALSE, simulate.p.value = FALSE, B = 2000 ) ## S3 method for class 'formula' chisq.test( x, y = NULL, correct = T, p = rep(1/length(x), length(x)), rescale.p = F, simulate.p.value = F, B = 2000 )
x |
a numeric vector, or matrix, or formula of the form |
y |
a numeric vector; ignored if |
correct |
a logical indicating whether to apply continuity
correction when computing the test statistic for 2 by 2 tables: one
half is subtracted from all |
p |
a vector of probabilities of the same length of |
rescale.p |
a logical scalar; if TRUE then |
simulate.p.value |
a logical indicating whether to compute p-values by Monte Carlo simulation. |
B |
an integer specifying the number of replicates used in the Monte Carlo test. |
If x
is a matrix with one row or column, or if x
is a vector
and y
is not given, then a _goodness-of-fit test_ is performed
(x
is treated as a one-dimensional contingency table). The
entries of x
must be non-negative integers. In this case, the
hypothesis tested is whether the population probabilities equal
those in p
, or are all equal if p
is not given.
If x
is a matrix with at least two rows and columns, it is taken
as a two-dimensional contingency table: the entries of x
must be
non-negative integers. Otherwise, x
and y
must be vectors or
factors of the same length; cases with missing values are removed,
the objects are coerced to factors, and the contingency table is
computed from these. Then Pearson's chi-squared test is performed
of the null hypothesis that the joint distribution of the cell
counts in a 2-dimensional contingency table is the product of the
row and column marginals.
If simulate.p.value
is FALSE
, the p-value is computed from the
asymptotic chi-squared distribution of the test statistic;
continuity correction is only used in the 2-by-2 case (if
correct
is TRUE
, the default). Otherwise the p-value is
computed for a Monte Carlo test (Hope, 1968) with B
replicates.
In the contingency table case simulation is done by random sampling from the set of all contingency tables with given marginals, and works only if the marginals are strictly positive. Continuity correction is never used, and the statistic is quoted without it. Note that this is not the usual sampling situation assumed for the chi-squared test but rather that for Fisher's exact test.
In the goodness-of-fit case simulation is done by random sampling
from the discrete distribution specified by p
, each sample being
of size n = sum(x)
. This simulation is done in R and may be
slow.
A list with class "htest"
containing the following components:
statistic: the value the chi-squared test statistic.
parameter: the degrees of freedom of the approximate chi-squared
distribution of the test statistic, NA
if the p-value is
computed by Monte Carlo simulation.
p.value: the p-value for the test.
method: a character string indicating the type of test performed, and whether Monte Carlo simulation or continuity correction was used.
data.name: a character string giving the name(s) of the data.
observed: the observed counts.
expected: the expected counts under the null hypothesis.
residuals: the Pearson residuals, ‘(observed - expected) / sqrt(expected)’.
stdres: standardized residuals, (observed - expected) / sqrt(V)
,
where V
is the residual cell variance (Agresti, 2007,
section 2.4.5 for the case where x
is a matrix, ‘n * p * (1
- p)’ otherwise).
The code for Monte Carlo simulation is a C translation of the Fortran algorithm of Patefield (1981).
Hope, A. C. A. (1968) A simplified Monte Carlo significance test procedure. _J. Roy, Statist. Soc. B_ *30*, 582-598.
Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with given row and column totals. _Applied Statistics_ *30*, 91-97.
Agresti, A. (2007) _An Introduction to Categorical Data Analysis, 2nd ed._, New York: John Wiley & Sons. Page 38.
For goodness-of-fit testing, notably of continuous distributions, ks.test
.
## Not run: ## From Agresti(2007) p.39 M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477))) dimnames(M) <- list(gender = c("F", "M"), party = c("Democrat","Independent", "Republican")) (Xsq <- chisq.test(M)) # Prints test summary Xsq$observed # observed counts (same as M) Xsq$expected # expected counts under the null Xsq$residuals # Pearson residuals Xsq$stdres # standardized residuals ## Effect of simulating p-values x <- matrix(c(12, 5, 7, 7), ncol = 2) chisq.test(x)$p.value # 0.4233 chisq.test(x, simulate.p.value = TRUE, B = 10000)$p.value # around 0.29! ## Testing for population probabilities ## Case A. Tabulated data x <- c(A = 20, B = 15, C = 25) chisq.test(x) chisq.test(as.table(x)) # the same x <- c(89,37,30,28,2) p <- c(40,20,20,15,5) try( chisq.test(x, p = p) # gives an error ) chisq.test(x, p = p, rescale.p = TRUE) # works p <- c(0.40,0.20,0.20,0.19,0.01) # Expected count in category 5 # is 1.86 < 5 ==> chi square approx. chisq.test(x, p = p) # maybe doubtful, but is ok! chisq.test(x, p = p, simulate.p.value = TRUE) ## Case B. Raw data x <- trunc(5 * runif(100)) chisq.test(table(x)) # NOT 'chisq.test(x)'! ### ## End(Not run)
## Not run: ## From Agresti(2007) p.39 M <- as.table(rbind(c(762, 327, 468), c(484, 239, 477))) dimnames(M) <- list(gender = c("F", "M"), party = c("Democrat","Independent", "Republican")) (Xsq <- chisq.test(M)) # Prints test summary Xsq$observed # observed counts (same as M) Xsq$expected # expected counts under the null Xsq$residuals # Pearson residuals Xsq$stdres # standardized residuals ## Effect of simulating p-values x <- matrix(c(12, 5, 7, 7), ncol = 2) chisq.test(x)$p.value # 0.4233 chisq.test(x, simulate.p.value = TRUE, B = 10000)$p.value # around 0.29! ## Testing for population probabilities ## Case A. Tabulated data x <- c(A = 20, B = 15, C = 25) chisq.test(x) chisq.test(as.table(x)) # the same x <- c(89,37,30,28,2) p <- c(40,20,20,15,5) try( chisq.test(x, p = p) # gives an error ) chisq.test(x, p = p, rescale.p = TRUE) # works p <- c(0.40,0.20,0.20,0.19,0.01) # Expected count in category 5 # is 1.86 < 5 ==> chi square approx. chisq.test(x, p = p) # maybe doubtful, but is ok! chisq.test(x, p = p, simulate.p.value = TRUE) ## Case B. Raw data x <- trunc(5 * runif(100)) chisq.test(table(x)) # NOT 'chisq.test(x)'! ### ## End(Not run)
This function creates an HTML widget to display rectangular data (a matrix or data frame) using the JavaScript library DataTables, with a method for desctable
objects.
datatable(data, ...) ## Default S3 method: datatable( data, options = list(), class = "display", callback = DT::JS("return table;"), caption = NULL, filter = c("none", "bottom", "top"), escape = TRUE, style = "default", width = NULL, height = NULL, elementId = NULL, fillContainer = getOption("DT.fillContainer", NULL), autoHideNavigation = getOption("DT.autoHideNavigation", NULL), selection = c("multiple", "single", "none"), extensions = list(), plugins = NULL, ... ) ## S3 method for class 'desctable' datatable( data, options = list(paging = F, info = F, search = list(), dom = "Brtip", fixedColumns = T, fixedHeader = T, buttons = c("copy", "excel")), class = "display", callback = DT::JS("return table;"), caption = NULL, filter = c("none", "bottom", "top"), escape = FALSE, style = "default", width = NULL, height = NULL, elementId = NULL, fillContainer = getOption("DT.fillContainer", NULL), autoHideNavigation = getOption("DT.autoHideNavigation", NULL), selection = c("multiple", "single", "none"), extensions = c("FixedHeader", "FixedColumns", "Buttons"), plugins = NULL, rownames = F, digits = 2, ... )
datatable(data, ...) ## Default S3 method: datatable( data, options = list(), class = "display", callback = DT::JS("return table;"), caption = NULL, filter = c("none", "bottom", "top"), escape = TRUE, style = "default", width = NULL, height = NULL, elementId = NULL, fillContainer = getOption("DT.fillContainer", NULL), autoHideNavigation = getOption("DT.autoHideNavigation", NULL), selection = c("multiple", "single", "none"), extensions = list(), plugins = NULL, ... ) ## S3 method for class 'desctable' datatable( data, options = list(paging = F, info = F, search = list(), dom = "Brtip", fixedColumns = T, fixedHeader = T, buttons = c("copy", "excel")), class = "display", callback = DT::JS("return table;"), caption = NULL, filter = c("none", "bottom", "top"), escape = FALSE, style = "default", width = NULL, height = NULL, elementId = NULL, fillContainer = getOption("DT.fillContainer", NULL), autoHideNavigation = getOption("DT.autoHideNavigation", NULL), selection = c("multiple", "single", "none"), extensions = c("FixedHeader", "FixedColumns", "Buttons"), plugins = NULL, rownames = F, digits = 2, ... )
data |
a data object (either a matrix or a data frame) |
... |
arguments passed to |
options |
a list of initialization options (see
https://datatables.net/reference/option/); the character options
wrapped in |
class |
the CSS class(es) of the table; see https://datatables.net/manual/styling/classes |
callback |
the body of a JavaScript callback function with the argument
|
caption |
the table caption; a character vector or a tag object
generated from |
filter |
whether/where to use column filters; |
escape |
whether to escape HTML entities in the table: |
style |
either |
width |
Width/Height in pixels (optional, defaults to automatic sizing) |
height |
Width/Height in pixels (optional, defaults to automatic sizing) |
elementId |
An id for the widget (a random string by default). |
fillContainer |
|
autoHideNavigation |
|
selection |
the row/column selection mode (single or multiple selection
or disable selection) when a table widget is rendered in a Shiny app;
alternatively, you can use a list of the form |
extensions |
a character vector of the names of the DataTables extensions (https://datatables.net/extensions/index) |
plugins |
a character vector of the names of DataTables plug-ins
(https://rstudio.github.io/DT/plugins.html). Note that only those
plugins supported by the |
rownames |
|
digits |
the desired number of digits after the decimal
point ( Default: 2 for integer, 4 for real numbers. If less than 0,
the C default of 6 digits is used. If specified as more than 50, 50
will be used with a warning unless |
You are recommended to escape the table content for security reasons (e.g. XSS attacks) when using this function in Shiny or any other dynamic web applications.
See https://rstudio.github.io/DT/ for the full documentation.
library(DT) # see the package vignette for examples and the link to website vignette('DT', package = 'DT') # some boring edge cases for testing purposes m = matrix(nrow = 0, ncol = 5, dimnames = list(NULL, letters[1:5])) datatable(m) # zero rows datatable(as.data.frame(m)) m = matrix(1, dimnames = list(NULL, 'a')) datatable(m) # one row and one column datatable(as.data.frame(m)) m = data.frame(a = 1, b = 2, c = 3) datatable(m) datatable(as.matrix(m)) # dates datatable(data.frame( date = seq(as.Date("2015-01-01"), by = "day", length.out = 5), x = 1:5 )) datatable(data.frame(x = Sys.Date())) datatable(data.frame(x = Sys.time())) ###
library(DT) # see the package vignette for examples and the link to website vignette('DT', package = 'DT') # some boring edge cases for testing purposes m = matrix(nrow = 0, ncol = 5, dimnames = list(NULL, letters[1:5])) datatable(m) # zero rows datatable(as.data.frame(m)) m = matrix(1, dimnames = list(NULL, 'a')) datatable(m) # one row and one column datatable(as.data.frame(m)) m = data.frame(a = 1, b = 2, c = 3) datatable(m) datatable(as.matrix(m)) # dates datatable(data.frame( date = seq(as.Date("2015-01-01"), by = "day", length.out = 5), x = 1:5 )) datatable(data.frame(x = Sys.Date())) datatable(data.frame(x = Sys.time())) ###
Output a desctable to the desired target format
desc_output(desctable, target = c("df", "pander", "DT"), digits = 2, ...)
desc_output(desctable, target = c("df", "pander", "DT"), digits = 2, ...)
desctable |
The desctable to output |
target |
The desired target. One of "df", "pander", or "DT". |
digits |
The number of digits to display. The p values will be simplified under 1E-digits |
... |
Other arguments to pass to |
Output a simple or grouped desctable to a different formats. Currently available formats are
data.frame ("df")
pander ("pander")
datatable ("DT")
All numerical values will be rounded to the digits argument. If statistical tests are presents, p values below 1E-digits will be replaced with "< 1E-digits" (eg. "< 0.01" for values below 0.01 when digits = 2)
The output object (or corresponding side effect)
Other desc_table core functions:
desc_table()
,
desc_tests()
Generate a statistics table with the chosen statistical functions, nested if called with a grouped dataframe.
desc_table(data, ..., .auto, .labels) ## Default S3 method: desc_table(data, ..., .auto, .labels) ## S3 method for class 'data.frame' desc_table(data, ..., .labels = NULL, .auto = stats_auto) ## S3 method for class 'grouped_df' desc_table(data, ..., .auto = stats_auto, .labels = NULL)
desc_table(data, ..., .auto, .labels) ## Default S3 method: desc_table(data, ..., .auto, .labels) ## S3 method for class 'data.frame' desc_table(data, ..., .labels = NULL, .auto = stats_auto) ## S3 method for class 'grouped_df' desc_table(data, ..., .auto = stats_auto, .labels = NULL)
data |
The dataframe to analyze |
... |
A list of named statistics to apply to each element of the dataframe, or a function returning a list of named statistics |
.auto |
A function to automatically determine appropriate statistics |
.labels |
A named character vector of variable labels |
A simple or grouped descriptive table
The statistical functions to use in the table are passed as additional arguments.
If the argument is named (eg. N = length
) the name will be used as the column title instead of the function
name (here, N instead of length).
Any R function can be a statistical function, as long as it returns only one value when applied to a vector, or as many values as there are levels in a factor, plus one.
Users can also use purrr::map
-like formulas as quick anonymous functions (eg. Q1 = ~ quantile(., .25)
to get the first quantile in a
column named Q1)
If no statistical function is given to desc_table
, the .auto
argument is used to provide a function
that automatically determines the most appropriate statistical functions to use based on the contents of the table.
.labels
is a named character vector to provide "pretty" labels to variables.
If given, the variable names for which there is a label will be replaced by their corresponding label.
Not all variables need to have a label, and labels for non-existing variables are ignored.
labels must be given in the form c(unquoted_variable_name = "label")
The output is either a dataframe in the case of a simple descriptive table, or nested dataframes in the case of a comparative table.
Other desc_table core functions:
desc_output()
,
desc_tests()
iris %>% desc_table() # Does the same as stats_auto here iris %>% desc_table("N" = length, "Min" = min, "Q1" = ~quantile(., .25), "Med" = median, "Mean" = mean, "Q3" = ~quantile(., .75), "Max" = max, "sd" = sd, "IQR" = IQR) # With grouping on a factor iris %>% group_by(Species) %>% desc_table(.auto = stats_auto)
iris %>% desc_table() # Does the same as stats_auto here iris %>% desc_table("N" = length, "Min" = min, "Q1" = ~quantile(., .25), "Med" = median, "Mean" = mean, "Q3" = ~quantile(., .75), "Max" = max, "sd" = sd, "IQR" = IQR) # With grouping on a factor iris %>% group_by(Species) %>% desc_table(.auto = stats_auto)
Add test statistics to a grouped desc_table, with the tests specified as variable = test
.
desc_tests(desctable, .auto = tests_auto, .default = NULL, ...)
desc_tests(desctable, .auto = tests_auto, .default = NULL, ...)
desctable |
A desc_table |
.auto |
A function to automatically determine the appropriate tests |
.default |
A default fallback test |
... |
A list of statistical tests associated to variable names |
A desc_table with tests
The statistical test functions to use in the table are passed as additional named arguments. Tests must be preceded
by a formula tilde (~
).
name = ~test
will apply test test
to variable name
.
Any R test function can be used, as long as it returns an object containing a p.value
element, which is the
case for most tests returning an object of class htest
.
Users can also use purrr::map
-like formulas as quick anonymous functions (eg. ~ t.test(., var.equal = T)
to
compute a t test without the Welch correction.
Other desc_table core functions:
desc_output()
,
desc_table()
iris %>% group_by(Species) %>% desc_table() %>% desc_tests(Sepal.Length = ~kruskal.test, Sepal.Width = ~oneway.test, Petal.Length = ~oneway.test(., var.equal = T), Petal.Length = ~oneway.test(., var.equal = F))
iris %>% group_by(Species) %>% desc_table() %>% desc_tests(Sepal.Length = ~kruskal.test, Sepal.Width = ~oneway.test, Petal.Length = ~oneway.test(., var.equal = T), Petal.Length = ~oneway.test(., var.equal = F))
Generate a statistics table with the chosen statistical functions, and tests if given a "grouped"
dataframe.
desctable(data, stats, tests, labels) ## Default S3 method: desctable(data, stats = stats_auto, tests, labels = NULL) ## S3 method for class 'grouped_df' desctable(data, stats = stats_auto, tests = tests_auto, labels = NULL)
desctable(data, stats, tests, labels) ## Default S3 method: desctable(data, stats = stats_auto, tests, labels = NULL) ## S3 method for class 'grouped_df' desctable(data, stats = stats_auto, tests = tests_auto, labels = NULL)
data |
The dataframe to analyze |
stats |
A list of named statistics to apply to each element of the dataframe, or a function returning a list of named statistics |
tests |
A list of statistical tests to use when calling desctable with a grouped_df |
labels |
A named character vector of labels to use instead of variable names |
A desctable object, which prints to a table of statistics for all variables
labels is an option named character vector used to make the table prettier.
If given, the variable names for which there is a label will be replaced by their corresponding label.
Not all variables need to have a label, and labels for non-existing variables are ignored.
labels must be given in the form c(unquoted_variable_name = "label")
The stats can be a function which takes a dataframe and returns a list of statistical functions to use.
stats can also be a named list of statistical functions, or purrr::map like formulas.
The names will be used as column names in the resulting table. If an element of the list is a function, it will be used as-is for the stats.
The tests can be a function which takes a variable and a grouping variable, and returns an appropriate statistical test to use in that case.
tests can also be a named list of statistical test functions, associating the name of a variable in the data and a test to use specifically for that variable.
That test name must be expressed as a single-term formula (e.g. ~t.test
), or a purrr::map like formula
(e.g. ~t.test(., var.equal = T)
). You don't have to specify tests for all the variables: a default test for
all other variables can be defined with the name .default
, and an automatic test can be defined with the name .auto
.
If data is a grouped dataframe (using group_by
), subtables are created and statistic tests are performed over each sub-group.
The output is a desctable object, which is a list of named dataframes that can be further manipulated. Methods for printing, using in pander and DT are present. Printing reduces the object to a dataframe.
iris %>% desctable() # Does the same as stats_auto here iris %>% desctable(stats = list("N" = length, "Mean" = ~ if (is.normal(.)) mean(.), "sd" = ~ if (is.normal(.)) sd(.), "Med" = stats::median, "IQR" = ~ if(!is.factor(.)) IQR(.))) # With labels mtcars %>% desctable(labels = c(hp = "Horse Power", cyl = "Cylinders", mpg = "Miles per gallon")) # With grouping on a factor iris %>% group_by(Species) %>% desctable(stats = stats_default) # With nested grouping, on arbitrary variables mtcars %>% group_by(vs, cyl) %>% desctable() # With grouping on a condition, and choice of tests iris %>% group_by(Petal.Length > 5) %>% desctable(tests = list(.auto = tests_auto, Species = ~chisq.test))
iris %>% desctable() # Does the same as stats_auto here iris %>% desctable(stats = list("N" = length, "Mean" = ~ if (is.normal(.)) mean(.), "sd" = ~ if (is.normal(.)) sd(.), "Med" = stats::median, "IQR" = ~ if(!is.factor(.)) IQR(.))) # With labels mtcars %>% desctable(labels = c(hp = "Horse Power", cyl = "Cylinders", mpg = "Miles per gallon")) # With grouping on a factor iris %>% group_by(Species) %>% desctable(stats = stats_default) # With nested grouping, on arbitrary variables mtcars %>% group_by(vs, cyl) %>% desctable() # With grouping on a condition, and choice of tests iris %>% group_by(Petal.Length > 5) %>% desctable(tests = list(.auto = tests_auto, Species = ~chisq.test))
Performs Fisher's exact test for testing the null of independence of rows and columns in a contingency table with fixed marginals, or with a formula expression.
fisher.test( x, y, workspace, hybrid, control, or, alternative, conf.int, conf.level, simulate.p.value, B ) ## Default S3 method: fisher.test(x, ...) ## S3 method for class 'formula' fisher.test( x, y = NULL, workspace = 200000, hybrid = F, control = list(), or = 1, alternative = "two.sided", conf.int = T, conf.level = 0.95, simulate.p.value = F, B = 2000 )
fisher.test( x, y, workspace, hybrid, control, or, alternative, conf.int, conf.level, simulate.p.value, B ) ## Default S3 method: fisher.test(x, ...) ## S3 method for class 'formula' fisher.test( x, y = NULL, workspace = 200000, hybrid = F, control = list(), or = 1, alternative = "two.sided", conf.int = T, conf.level = 0.95, simulate.p.value = F, B = 2000 )
x |
either a two-dimensional contingency table in matrix form, a factor object, or a formula of the form |
y |
a factor object; ignored if |
workspace |
an integer specifying the size of the workspace
used in the network algorithm. In units of 4 bytes. Only used for
non-simulated p-values larger than |
hybrid |
a logical. Only used for larger than |
control |
a list with named components for low level algorithm
control. At present the only one used is |
or |
the hypothesized odds ratio. Only used in the
|
alternative |
indicates the alternative hypothesis and must be
one of |
conf.int |
logical indicating if a confidence interval for the
odds ratio in a |
conf.level |
confidence level for the returned confidence
interval. Only used in the |
simulate.p.value |
a logical indicating whether to compute
p-values by Monte Carlo simulation, in larger than |
B |
an integer specifying the number of replicates used in the Monte Carlo test. |
... |
additional params to feed to original fisher.test |
If x
is a matrix, it is taken as a two-dimensional contingency
table, and hence its entries should be nonnegative integers.
Otherwise, both x
and y
must be vectors of the same length.
Incomplete cases are removed, the vectors are coerced into factor
objects, and the contingency table is computed from these.
For 2 by 2 cases, p-values are obtained directly using the (central or non-central) hypergeometric distribution. Otherwise, computations are based on a C version of the FORTRAN subroutine FEXACT which implements the network developed by Mehta and Patel (1986) and improved by Clarkson, Fan and Joe (1993). The FORTRAN code can be obtained from http://www.netlib.org/toms/643. Note this fails (with an error message) when the entries of the table are too large. (It transposes the table if necessary so it has no more rows than columns. One constraint is that the product of the row marginals be less than 2^31 - 1.)
For 2 by 2 tables, the null of conditional independence is
equivalent to the hypothesis that the odds ratio equals one.
Exact
inference can be based on observing that in general, given
all marginal totals fixed, the first element of the contingency
table has a non-central hypergeometric distribution with
non-centrality parameter given by the odds ratio (Fisher, 1935).
The alternative for a one-sided test is based on the odds ratio,
so alternative = "greater"
is a test of the odds ratio being
bigger than or
.
Two-sided tests are based on the probabilities of the tables, and
take as more extreme
all tables with probabilities less than or
equal to that of the observed table, the p-value being the sum of
such probabilities.
For larger than 2 by 2 tables and hybrid = TRUE
, asymptotic
chi-squared probabilities are only used if the ‘Cochran
conditions’ are satisfied, that is if no cell has count zero, and
more than 80
exact calculation is used.
Simulation is done conditional on the row and column marginals, and works only if the marginals are strictly positive. (A C translation of the algorithm of Patefield (1981) is used.)
A list with class "htest"
containing the following components:
p.value: the p-value of the test.
conf.int: a confidence interval for the odds ratio. Only present in
the 2 by 2 case and if argument conf.int = TRUE
.
estimate: an estimate of the odds ratio. Note that the _conditional_ Maximum Likelihood Estimate (MLE) rather than the unconditional MLE (the sample odds ratio) is used. Only present in the 2 by 2 case.
null.value: the odds ratio under the null, or
. Only present in the 2
by 2 case.
alternative: a character string describing the alternative hypothesis.
method: the character string "Fisher's Exact Test for Count Data"
.
data.name: a character string giving the names of the data.
Agresti, A. (1990) _Categorical data analysis_. New York: Wiley. Pages 59-66.
Agresti, A. (2002) _Categorical data analysis_. Second edition. New York: Wiley. Pages 91-101.
Fisher, R. A. (1935) The logic of inductive inference. _Journal of the Royal Statistical Society Series A_ *98*, 39-54.
Fisher, R. A. (1962) Confidence limits for a cross-product ratio. _Australian Journal of Statistics_ *4*, 41.
Fisher, R. A. (1970) _Statistical Methods for Research Workers._ Oliver & Boyd.
Mehta, C. R. and Patel, N. R. (1986) Algorithm 643. FEXACT: A Fortran subroutine for Fisher's exact test on unordered r*c contingency tables. _ACM Transactions on Mathematical Software_, *12*, 154-161.
Clarkson, D. B., Fan, Y. and Joe, H. (1993) A Remark on Algorithm 643: FEXACT: An Algorithm for Performing Fisher's Exact Test in r x c Contingency Tables. _ACM Transactions on Mathematical Software_, *19*, 484-488.
Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with given row and column totals. _Applied Statistics_ *30*, 91-97.
fisher.exact
in package kexact2x2 for alternative
interpretations of two-sided tests and confidence intervals for 2
by 2 tables.
## Not run: ## Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinker ## A British woman claimed to be able to distinguish whether milk or ## tea was added to the cup first. To test, she was given 8 cups of ## tea, in four of which milk was added first. The null hypothesis ## is that there is no association between the true order of pouring ## and the woman's guess, the alternative that there is a positive ## association (that the odds ratio is greater than 1). TeaTasting <- matrix(c(3, 1, 1, 3), nrow = 2, dimnames = list(Guess = c("Milk", "Tea"), Truth = c("Milk", "Tea"))) fisher.test(TeaTasting, alternative = "greater") ## => p = 0.2429, association could not be established ## Fisher (1962, 1970), Criminal convictions of like-sex twins Convictions <- matrix(c(2, 10, 15, 3), nrow = 2, dimnames = list(c("Dizygotic", "Monozygotic"), c("Convicted", "Not convicted"))) Convictions fisher.test(Convictions, alternative = "less") fisher.test(Convictions, conf.int = FALSE) fisher.test(Convictions, conf.level = 0.95)$conf.int fisher.test(Convictions, conf.level = 0.99)$conf.int ## A r x c table Agresti (2002, p. 57) Job Satisfaction Job <- matrix(c(1,2,1,0, 3,3,6,1, 10,10,14,9, 6,7,12,11), 4, 4, dimnames = list(income = c("< 15k", "15-25k", "25-40k", "> 40k"), satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS"))) fisher.test(Job) fisher.test(Job, simulate.p.value = TRUE, B = 1e5) ### ## End(Not run)
## Not run: ## Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinker ## A British woman claimed to be able to distinguish whether milk or ## tea was added to the cup first. To test, she was given 8 cups of ## tea, in four of which milk was added first. The null hypothesis ## is that there is no association between the true order of pouring ## and the woman's guess, the alternative that there is a positive ## association (that the odds ratio is greater than 1). TeaTasting <- matrix(c(3, 1, 1, 3), nrow = 2, dimnames = list(Guess = c("Milk", "Tea"), Truth = c("Milk", "Tea"))) fisher.test(TeaTasting, alternative = "greater") ## => p = 0.2429, association could not be established ## Fisher (1962, 1970), Criminal convictions of like-sex twins Convictions <- matrix(c(2, 10, 15, 3), nrow = 2, dimnames = list(c("Dizygotic", "Monozygotic"), c("Convicted", "Not convicted"))) Convictions fisher.test(Convictions, alternative = "less") fisher.test(Convictions, conf.int = FALSE) fisher.test(Convictions, conf.level = 0.95)$conf.int fisher.test(Convictions, conf.level = 0.99)$conf.int ## A r x c table Agresti (2002, p. 57) Job Satisfaction Job <- matrix(c(1,2,1,0, 3,3,6,1, 10,10,14,9, 6,7,12,11), 4, 4, dimnames = list(income = c("< 15k", "15-25k", "25-40k", "> 40k"), satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS"))) fisher.test(Job) fisher.test(Job, simulate.p.value = TRUE, B = 1e5) ### ## End(Not run)
Safe version of IQR for statify
IQR(x)
IQR(x)
x |
A vector |
The IQR
Test if distribution is normal. The condition for normality is length > 30 and non-significant Shapiro-Wilks test with p > .1
is.normal(x)
is.normal(x)
x |
A numerical vector |
A boolean
An empty test
no.test(formula)
no.test(formula)
formula |
A formula |
Pander method to output a desctable
## S3 method for class 'desctable' pander( x = NULL, digits = 2, justify = "left", missing = "", keep.line.breaks = T, split.tables = Inf, emphasize.rownames = F, ... )
## S3 method for class 'desctable' pander( x = NULL, digits = 2, justify = "left", missing = "", keep.line.breaks = T, split.tables = Inf, emphasize.rownames = F, ... )
x |
A desctable |
digits |
passed to |
justify |
defines alignment in cells passed to |
missing |
string to replace missing values |
keep.line.breaks |
(default: |
split.tables |
where to split wide tables to separate tables. The default value ( |
emphasize.rownames |
boolean (default: |
... |
unsupported extra arguments directly placed into |
Uses pandoc.table
, with some default parameters (digits = 2
, justify = "left"
, missing = ""
, keep.line.breaks = T
, split.tables = Inf
, and emphasize.rownames = F
), that you can override if needed.
Return a compatible vector of length nlevels(x) + 1 to print the percentages of each level of a factor
percent(x)
percent(x)
x |
A factor |
A nlevels(x) + 1 length vector of percentages
Print method for desctable
## S3 method for class 'desctable' print(x, ...)
## S3 method for class 'desctable' print(x, ...)
x |
A desctable |
... |
Additional print parameters |
A flat dataframe
This function takes a dataframe as argument and returns a list of statistcs in the form accepted by desctable.
stats_auto(data)
stats_auto(data)
data |
The dataframe to apply the statistic to |
You can define your own automatic function, as long as it takes a dataframe as argument and returns a list of functions, or formulas defining conditions to use a stat function.
A list of statistics to use, assessed from the content of the dataframe
Define a list of default statistics
stats_default(data) stats_normal(data) stats_nonnormal(data)
stats_default(data) stats_normal(data) stats_nonnormal(data)
data |
A dataframe |
A list of statistical functions
This function takes a variable and a grouping variable as arguments, and returns a statistcal test to use, expressed as a single-term formula.
tests_auto(var, grp)
tests_auto(var, grp)
var |
The variable to test |
grp |
The variable for the groups |
This function uses appropriate non-parametric tests depending on the number of levels (wilcoxon.test for two levels and kruskal.test for more), and fisher.test with fallback on chisq.test on error for factors.
A statistical test function