Rでまとめてt検定を行った結果を一覧で確認する方法のメモ

Rでまとめてt検定を行った結果を一覧で確認する方法のメモ。

要旨としては

まとめてt検定を行った時の結果をデータフレーム（tibble）形式で表示してくれる{broom}パッケージのtidy()関数をつかって表示する。

というもの。

まとめてt検定を行う方法についてはこちらの方法を使っていく。

www.rpubs.com

こちらの方法を使うと、グループ間での異なる変数について複数のt検定等を行う事ができる。

示されている例にならって、試しにやってみると次のようになる。

data(cancer, package="survival")
# 上記のページに提示されている関数を使う。
multi.tests <- function(fun = t.test, df, vars, group.var, ...) {
    sapply(simplify = FALSE,                                    # sapply(simplify=T) better, elements named
           vars,                                                # loop on vector of outcome variable names
           function(var) {
               formula <- as.formula(paste(var, "~", group.var))# create a formula with outcome and grouping var.
               fun(data = df, formula, ...)                     # perform test with a given fun, default t.test
           }
           )
}
# まとめてt検定を行う。
res.multi.t.tests <-
    multi.tests(fun = t.test,
                df = kidney,
                vars = c("time","age","frail"),
                group.var = "sex",
                var.equal = TRUE)
# 結果を確認する。
res.multi.t.tests

## $time
## 
##  Two Sample t-test
## 
## data:  time by sex
## t = -1.706, df = 74, p-value = 0.09221
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -124.551228    9.651228
## sample estimates:
## mean in group 1 mean in group 2 
##           59.30          116.75 
## 
## 
## $age
## 
##  Two Sample t-test
## 
## data:  age by sex
## t = -0.069294, df = 74, p-value = 0.9449
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -7.970091  7.434376
## sample estimates:
## mean in group 1 mean in group 2 
##        43.50000        43.76786 
## 
## 
## $frail
## 
##  Two Sample t-test
## 
## data:  frail by sex
## t = 0.95032, df = 74, p-value = 0.345
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -0.1872248  0.5286533
## sample estimates:
## mean in group 1 mean in group 2 
##        1.310000        1.139286

結果をt検定のそれぞれの結果がprint()（print.htest()）で表示した時の状態で表示される。

少量なら別にそんなに気にならないが、画面がスクロールしてしまうので少し見にくい。統計量やp値や95％信頼区間をデータフレーム形式で表示できると嬉しい。

このように計算結果の値をデータフレーム形式（tibble形式）で出力する関数としてとして{broom}のtidy()があり、これはt検定などの結果のhtestクラスにも対応している。

そのためたとえば、

ans <- t.test(extra ~ group, data = sleep)
broom::tidy(ans)

## # A tibble: 1 × 10
##   estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
##      <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl>
## 1    -1.58      0.75      2.33     -1.86  0.0794      17.8    -3.37     0.205
## # ℹ 2 more variables: method <chr>, alternative <chr>

と一連の計算結果がデータフレーム形式で表示できる、この関数を用いてmulti.test()で計算された結果を表示できると嬉しい。

multi.test()の結果はリスト形式になっているので、purrr::map_df()で取り出していくことにする。{broom}のtidy()はhtestクラスだと何と何の統計結果なのか（print()した時のdataのところ、list上のdata.nameのところ）を取り出さないので、それも取り出すようにすると次にようになる。

あらかじめinstall.packages("purrr")とinstall.packages("broom")で必要なパッケージをインストールしておく必要がある。

res.multi.t.tests |> 
  purrr::map_df(\(x) c(data = x$data.name, broom::tidy(x)))

## # A tibble: 3 × 11
##   data         estimate estimate1 estimate2 statistic p.value parameter conf.low
##   <chr>           <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>
## 1 time by sex   -57.4       59.3     117.     -1.71    0.0922        74 -125.   
## 2 age by sex     -0.268     43.5      43.8    -0.0693  0.945         74   -7.97 
## 3 frail by sex    0.171      1.31      1.14    0.950   0.345         74   -0.187
## # ℹ 3 more variables: conf.high <dbl>, method <chr>, alternative <chr>

こうすると複数のt検定をしたときに、統計量やp値や95％信頼区間の上下限も一括で簡単に確認できるようになる。

備忘ログ

チラシの裏的備忘録&メモ

Rでまとめてt検定を行った結果を一覧で確認する方法のメモ