vignettes/multiple-files.Rmd
multiple-files.Rmd
Suppose we want to load all the Eprime files in a directory and combine the results in dataframe.
My strategy in this scenario is to figure out what I need to do for a single file and then wrap those steps in a function that takes a filepath to a txt file and returns a dataframe. After some exploration and interactive programming, I come up with the following function.
library("plyr") reduce_sails <- function(sails_path) { sails_lines <- read_eprime(sails_path) sails_frames <- FrameList(sails_lines) # Trials occur at level 3 sails_frames <- keep_levels(sails_frames, 3) sails <- to_data_frame(sails_frames) # Tidy up to_pick <- c("Eprime.Basename", "Running", "Module", "Sound", "Sample", "Correct", "Response") sails <- sails[to_pick] running_map <- c(TrialLists = "Trial", PracticeBlock = "Practice") sails$Running <- running_map[sails$Running] # Renumber trials in the practice and experimental blocks separately. # Numerically code correct response. sails <- ddply(sails, .(Running), mutate, TrialNumber = seq(from = 1, to = length(Running)), CorrectResponse = ifelse(Correct == Response, 1, 0)) sails$Sample <- NULL # Optionally, one might save the processed file via: # csv <- paste0(file_path_sans_ext(sails_path), ".csv") # write.csv(sails, csv, row.names = FALSE) sails }
Here’s a preview of what the function returns when given a filepath.
head(reduce_sails("data/SAILS/SAILS_001X00XS1.txt")) #> Eprime.Basename Running Module Sound Correct Response TrialNumber #> 1 SAILS_001X00XS1 Practice LAKE LAKE1.WAV Word Word 1 #> 2 SAILS_001X00XS1 Practice LAKE MAKE.WAV NotWord NotWord 2 #> 3 SAILS_001X00XS1 Practice LAKE LAKE1.WAV Word Word 3 #> 4 SAILS_001X00XS1 Practice LAKE LAKE1.WAV Word Word 4 #> 5 SAILS_001X00XS1 Practice LAKE MAKE.WAV NotWord NotWord 5 #> 6 SAILS_001X00XS1 Practice LAKE LAKE1.WAV Word Word 6 #> CorrectResponse #> 1 1 #> 2 1 #> 3 1 #> 4 1 #> 5 1 #> 6 1
Now that the function works on one file, I can use ldply
to apply the function to several files, returning results in a single dataframe. (For dplyr
, I would lapply
the function to each path to get a list of dataframes, then use bind_rows
to combine into a single dataframe.)
sails_paths <- list.files("data/SAILS/", pattern = ".txt", full.names = TRUE) sails_paths #> [1] "data/SAILS/SAILS_001X00XS1.txt" "data/SAILS/SAILS_002X00XS1.txt" ensemble <- ldply(sails_paths, reduce_sails)
Finally, with all of the subjects’ data contained in a single dataframe, I can use ddply
plus summarise
and compute summary scores at different levels of aggregation within each subject.
# Score trials within subjects overall <- ddply(ensemble, .(Eprime.Basename, Running), summarise, Score = sum(CorrectResponse), PropCorrect = Score / length(CorrectResponse)) overall #> Eprime.Basename Running Score PropCorrect #> 1 SAILS_001X00XS1 Practice 10 1.0000000 #> 2 SAILS_001X00XS1 Trial 61 0.8714286 #> 3 SAILS_002X00XS1 Practice 9 0.9000000 #> 4 SAILS_002X00XS1 Trial 57 0.8142857 # Score modules within subjects modules <- ddply(ensemble, .(Eprime.Basename, Running, Module), summarise, Score = sum(CorrectResponse), PropCorrect = mean(CorrectResponse)) modules #> Eprime.Basename Running Module Score PropCorrect #> 1 SAILS_001X00XS1 Practice LAKE 10 1.0000000 #> 2 SAILS_001X00XS1 Trial CAT 9 0.9000000 #> 3 SAILS_001X00XS1 Trial LAKE 10 1.0000000 #> 4 SAILS_001X00XS1 Trial RAT 16 0.8000000 #> 5 SAILS_001X00XS1 Trial SUE 26 0.8666667 #> 6 SAILS_002X00XS1 Practice LAKE 9 0.9000000 #> 7 SAILS_002X00XS1 Trial CAT 8 0.8000000 #> 8 SAILS_002X00XS1 Trial LAKE 10 1.0000000 #> 9 SAILS_002X00XS1 Trial RAT 11 0.5500000 #> 10 SAILS_002X00XS1 Trial SUE 28 0.9333333