PLEASE NOTE: This is an R-Markdown file. It’s best opened in R-Studio, where it can be rendered (with the raw data) into an HTML file containing plots, output, and nicely formatted code, or at the very least, viewed with the proper syntax highlighting. Anything within {r} blocks is valid R code which can be run anywhere.

Description of the Data Processing Process

This first part is just informational, designed to tell you where to find everything and how to run this analysis.

Raw Data Locations

The raw data is stored in three places:

  1. prod_data_flow
  1. perc_fixations_v10_final.txt
  1. vn_durations.txt

For this script to run, all three must be present.

Statistical Processing Pipeline

There are exactly three R scripts that, when run in the proper ordering, will take us from raw data to final graphs and model outputs:

  1. language_analysis.Rmd - Prepares the raw data stored above, does the PCA, and gets everything in a format ready for analysis (stored in language2017_data.RData). This also outputs the raw PCA graphs. This script then takes as input the processed language2017_data.RData file and the output of the MCMC modeling scripts (generated partway through), and creates all the necessary graphs and analysis exports.

No additional code is needed to produce the final analysis, although you’ll likely want to create another R script file for running the mcmcglmm models, as it runs across many, many hours.

Data Output

This script will output everything to this folder. This includes plots, model output, and otherwise.

The remainder of this script will actually run the code.

Library Reading and Data Loading

Loading Libraries

# For LMER models
library(lme4)
## Loading required package: Matrix
library(lmerTest)
## 
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
## 
##     lmer
## The following object is masked from 'package:stats':
## 
##     step
# For Bayesian LMER
library(MCMCglmm)
## Loading required package: coda
## Loading required package: ape
# For plotting
library(ggplot2)
library(gridExtra)
library(effects)
## Loading required package: carData
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
# For parallel processing
library(parallel)
# This gives us handy functions (summarySE, multiplot)
library(Rmisc)
## Loading required package: lattice
## Loading required package: plyr
# For handling curvaceous Data
library(splines)

# If you're running on a two core machine (read: any non-desktop Mac smaller than 15"), change this to 2 to restrict multi-core processing.
ncores <- 6

# Color scheme
genericpalette = c("#00274c","#ffcb05","#7a121c","#cc6600")
generictripalette = c("#00274c","#A8A8A8","#ffcb05")

# Shortcut to the working directory
wd <- getwd()

And let’s specify which participants are disqualified, nice and early:

disqualified.participants <- c("p01","p13","p14","p21","p22","p23","p29","p34","p49","p53","p70","p69","p79")
smallmask.participants <- c("p04","p05","p12","p25","p33","p36","p46","p58")
dqlist <- c(disqualified.participants,smallmask.participants)

Perception and Production Data Prep

This code loads and generates the perception and production data, saving it into an .RData file for later reference

### Initial Data Processing

# First, we load in the perception data and then remove practice data:

rawfixations <- read.table("perc_fixations_v10_final.txt", sep="\t",quote = "",header=TRUE)

# Remove Practice Data
rawfixations <- subset(rawfixations,rawfixations$block=="2")

# Now we need to calculate proper times, this uses an adaptation of a formula originally from KMG's Python script:

rawfixations$start <- (rawfixations$TRIAL_START_TIME + rawfixations$CURRENT_FIX_START) - rawfixations$IP_START_TIME

rawfixations$end <- (rawfixations$TRIAL_START_TIME + rawfixations$CURRENT_FIX_END) - rawfixations$IP_START_TIME

# Drop unused levels from Condition
rawfixations$condition <- droplevels(rawfixations$condition)

# Remove fixations before the vowel's start
#rawfixations <- subset(rawfixations, rawfixations$start > 0)

# This chunk turns the splice and v_end data into something more usable by R, as R reads both in as factors.
rawfixations$splice <- as.numeric(levels(rawfixations$splice))[rawfixations$splice]
rawfixations$v_end <- as.numeric(levels(rawfixations$v_end))[rawfixations$v_end]

# Now, we create a cleaned dataframe from the rawfixations data, dropping unneeded columns and renaming the ones we do need:

percraw <- data.frame(session_id=rawfixations$RECORDING_SESSION_LABEL,
                    trial_type=rawfixations$type,
                    trial_num=as.factor(gsub("Trial: ","",rawfixations$TRIAL_LABEL)),
                    image_right=rawfixations$image_right,
                    image_left=rawfixations$image_left,
                    target_audio=rawfixations$audio_file,
                    v_onset=rawfixations$v_onset,
                    v_end=rawfixations$v_end,
                    splice=rawfixations$splice,
                    target_side=rawfixations$condition,
                    target_type=rawfixations$audio_played,
                    total_fix_num=rawfixations$TRIAL_FIXATION_TOTAL,
                    cur_fix_num=rawfixations$CURRENT_FIX_INDEX,
                    cur_fix_start=rawfixations$start,
                    cur_fix_end=rawfixations$end,
                    cur_fix_pupil=rawfixations$CURRENT_FIX_PUPIL,
                    cur_fix_ia_index=rawfixations$CURRENT_FIX_INTEREST_AREA_INDEX,
                    cur_fix_ia_label=rawfixations$CURRENT_FIX_INTEREST_AREA_LABEL,
                    prev_fix_ia_index=rawfixations$PREVIOUS_FIX_INTEREST_AREA_INDEX,
                    prev_fix_ia_label=rawfixations$PREVIOUS_FIX_INTEREST_AREA_LABEL,
                    next_fix_ia_index=rawfixations$NEXT_FIX_INTEREST_AREA_INDEX,
                    next_fix_ia_label=rawfixations$NEXT_FIX_INTEREST_AREA_LABEL,
                    prev_sac_start_ia_index=rawfixations$PREVIOUS_SAC_START_INTEREST_AREA_INDEX,
                    prev_sac_start_ia_label=rawfixations$PREVIOUS_SAC_START_INTEREST_AREA_LABEL
                    )

# Now, let's create some extra columns.  First, we'll pull the a or b out of the session column to get the participant ID alone

percraw$participant <- as.factor(gsub("a|b","", percraw$session))

# And toss excluded participants

percraw <- subset(percraw, !(participant %in% dqlist))

# Then we'll pull p followed by two numbers out of session to get the A/B alone.

percraw$session <- as.factor(gsub("p[0-9][0-9]","", percraw$session))

# We'll get the target word from the audio file played, then re-name the column for easy use.

percraw$target_word <- as.factor(gsub("(\\.|_).*","", percraw$target_audio))
percraw$word <- percraw$target_word

# If it's got the character N in the target type, it's nasal. Stupid, but functional.

percraw$target_isnasal <- as.numeric(ifelse(grepl("N",percraw$target_type), "1", "0"))

# Now we figure out the fixated word and direction by removing junk from the cur_fix_ia_label column.  I love parsing other columns for data.

percraw$cur_fix_word <- as.factor(gsub("*(IMAGE_LEFT|IMAGE_RIGHT|\\.bmp)","", percraw$cur_fix_ia_label))
percraw$cur_fix_side <- as.factor(ifelse(grepl("LEFT",percraw$cur_fix_ia_label), "LEFT", "RIGHT"))

# Now, let's figure out if a given trial had an accurate fixation.  The reference to cur_fix_ia_index is needed because even if there's no fix on a labeled target, the laterality can be saved by experiment builder. In that case, you'd get as "accurate" a fixation which is on the (e.g.) right side of the screen, but not looking at the correct image.  Basically, if there's no "side" (because you're looking at nothing), Dataviewer considers that a look to the last side examined.

percraw$accuracy <- as.numeric(ifelse((percraw$cur_fix_side == percraw$target_side & percraw$cur_fix_ia_index != "."), "1","0"))

percraw$compaccuracy <- as.numeric(ifelse((percraw$cur_fix_side != percraw$target_side & percraw$cur_fix_ia_index != "."), "1","0"))

percraw$nullaccuracy <- as.numeric(ifelse((percraw$cur_fix_ia_index == "."), "1","0"))

# Now we parse out the degree of nasalization.

percraw$nasalization <- as.factor(ifelse((grepl("6",percraw$target_audio)), "late", ifelse((grepl("8",percraw$target_audio)), "early","del_nas")))

# Create a version of type which combines voicing and nasalization

percraw$stimulus_type <- as.factor(paste(percraw$target_type,percraw$nasalization,sep="_"))

# We'll label the items which are the "final answers" (last fix in trial, or a second fix within the same IA)

percraw$is_final_fix <- as.numeric(ifelse((percraw$cur_fix_num == percraw$total_fix_num | percraw$cur_fix_ia_index == percraw$next_fix_ia_index), "1","0"))

# Label final accurate answers (which are different than accurate looks, where the person later looked away)

percraw$final_acc <- as.numeric(ifelse((percraw$is_final_fix == 1 | percraw$accuracy == 1), "1","0"))

# Label the items which are initial fixes (where they look one way, then look someplace else in the next fix)

percraw$is_initial_fix <- as.numeric(ifelse((percraw$prev_fix_ia_index == "." &
        percraw$cur_fix_ia_index != percraw$next_fix_ia_index &
        percraw$cur_fix_num != percraw$total_fix_num
        ), "1","0"))

# Label the items where they went to another IA before they came to this one

percraw$has_initial_fix <- as.numeric(ifelse((percraw$prev_fix_ia_index != "." &
        percraw$cur_fix_ia_index != percraw$prev_fix_ia_index
        ), "1","0"))

# Give each trial a unique identifier (p42a_127 is 'participant 42, session aperc, trial 127')

percraw$uniquetrial <- as.factor(paste(percraw$session_id,percraw$trial_num,sep="_"))

#Extract Structure and Onset information for later use:

percraw$structure <- as.factor(ifelse((grepl("ND",percraw$target_type)), "CVND", 
                                ifelse((grepl("NT",percraw$target_type)), "CVNT",
                                ifelse((grepl("D",percraw$target_type)), "CVD",
                                "CVT"))))
percraw$onset <- as.factor(ifelse((grepl("l",percraw$word)), "son", 
                                ifelse((grepl("w",percraw$word)), "son", 
                                ifelse((grepl("NVN",percraw$filename)), "nas", 
                                "obs"))))

percraw$onsstructure <- as.factor(paste0(percraw$structure,"_",percraw$onset))

# Code for NVN vs CVX type, as well as for early vs late nasalization

percraw$type <- as.factor(ifelse((grepl("NVN",percraw$structure)), "NVN", 
                                ifelse((grepl("N",percraw$structure)), "CVNC", 
                                "CVC")))

percraw$nasalization <- as.factor(ifelse((grepl("late",percraw$nasalization)), "late", 
                                    ifelse((grepl("early",percraw$nasalization)), "early",
                                    ifelse((grepl("CVC",percraw$type)), "oral",
                                    "del_nas"))))

# This is unpleasant, but we need to derive the competitor structure from this data, because it wasn't stored in the spreadsheet.  Sad.

# First we figure out which image was the competitor
percraw$compword <- as.factor(ifelse((percraw$target_side == "LEFT"), as.character(percraw$image_right), 
                              ifelse((percraw$target_side == "RIGHT"), as.character(percraw$image_left), "ELSE")))

# Now we clean it up by removing the .bmp
percraw$compword <- as.factor(gsub(".bmp","", percraw$compword))

# Now we parse the orthography (*shudder*) to get the word's type

percraw$compstructure <- as.factor(ifelse((grepl("nd",percraw$compword)), "CVND", 
                                ifelse((grepl("nt",percraw$compword)), "CVNT",
                                ifelse((grepl("d",percraw$compword)), "CVD",
                                "CVT"))))

percraw$comptype <- as.factor(ifelse((grepl("nd",percraw$compword)), "CVNC", 
                                ifelse((grepl("nt",percraw$compword)), "CVNC",
                                ifelse((grepl("d",percraw$compword)), "CVC",
                                "CVC"))))

# Let's make a few columns for recreating the JASA graphs

percraw$targetcomp <- as.factor(paste0(percraw$structure,"-",percraw$compstructure))

percraw$structurenas <- as.factor(paste0(percraw$structure,"_",percraw$nasalization))

percraw$targetcompnas <- as.factor(paste0(percraw$structurenas,"-",percraw$compstructure))

# Now we create a combined column with both structure and nasalization, and code by nasality and voicing

percraw$coding <- as.factor(paste(percraw$structure,percraw$nasalization,sep="_"))

percraw$isnasal <- as.factor(ifelse((grepl("N",percraw$structure)),"1", "0"))

percraw$voicing <- as.factor(ifelse((grepl("D",percraw$structure)),"1", "0"))

### Bucket Assignment

# **Step 1: Cut the start and end times into buckets**

# First, we calculate the buckets that each fixation starts in.  This uses R's cut() function, saying 'cut the range into 20ms chunks'

percraw$start.bucket <- as.numeric(cut(percraw$cur_fix_start, breaks = seq(0, 1000, by = 20)))

# For ending buckets, it's very possible that buckets could end after 1000ms, so break 0-3000ms.  This means that although all fixes start within 50 buckets, some don't end until higher bucket values

percraw$end.bucket <- as.numeric(cut(percraw$cur_fix_end, breaks = seq(0, 3000, by = 20)))

# You can un-comment this to write a clean version for external examination and backup, because disk space is cheap.

#write.table(perc,file="perc_data_clean.txt", sep="\t",row.names=FALSE)

# **Step 2: Create accuracy-labeled copies of every fixation/row for every bucket.  Let's start by creating a new DF to hold this:

trbuck <- percraw

# Sets a var which helps us handle the first row-append, below
count = 0

# The default is "this row is inaccurate".  Assigning it outside the loop saves time.

trbuck$bucketacc <- 0

# Now, for every bucket, we need to create a row for every fixation.  Each fixation is, in each bucket, either 'accurate' or 'inaccurate'.  So an accurate fixation from buckets 24-36 would have fifty rows, where bucketacc=1 only during rows 24-36.  This apply does that.

# By doing this, we'll end up with a copy of every fixation/row for every bucket, telling us whether that fixation was accurate during that bucket or not.  This first step creates a series of dataframes, one per bucket, which contain all the bucket-1, or bucket-2 etc rows.

processbucket <- function(bucket){
  
  # 1 - The fix in the row is "bucket accurate" (the fixation started in or before the bucket, ends in or after it, and it's looking at the target)
    blistacc <- subset(trbuck, start.bucket <= bucket & end.bucket >= bucket & accuracy == 1)
    blistacc$bucket <- bucket
    # Change the bucket accuracy to 1 for these only
    blistacc$bucketacc <- 1
    
    # 0 - The fix either starts after the bucket, ends before it, or isn't at the target, in which case it stays 0
    blistnofix <- subset(trbuck, start.bucket > bucket | end.bucket < bucket | accuracy == 0)
    blistnofix$bucket <- bucket
    assign(paste0("tb_",bucket), rbind(blistacc,blistnofix))
}

bucks <- c(1:50)

bucketapply <- lapply(bucks,processbucket)
tbuckets <- do.call(rbind.data.frame,bucketapply)

# Now, we'll rename to preserve the rawfixations data, give a shorter name:

perc.bucketed <- tbuckets

# The above code generates a lot of duplicated items.  Now we need to fix that.  Concatenate a set of factors which is unique to each bucket in each trial (there should be only one entry for each of these).

perc.bucketed$uniquecheck <- as.factor(paste(perc.bucketed$session_id,perc.bucketed$trial_num,perc.bucketed$bucket, sep="_"))

# First, we mark out all cases where there are multiple fixations per trial per bucket and label any duplicates of an existing trial/bucket (as each trial should be either accurate or inaccurate for each bucket).  

# * Note that duplicated() only marks subsequent reps of uniquecheck as DUPE=TRUE.  So, this will preserve the first row where there are several for the trial/bucket.  Which is why we added the accurate fixations BEFORE innaccuate ones.  
# * If there's one bucket-accurate fix/row, it'll always mark that one as DUPE=FALSE, as it'll be on top.
# * If there are two bucket-accurate fixes/rows, the topmost will be kept.  This only happens when an accurate fix ends in a bucket, and a new one starts in the same bucket.  Basically an Eyelink error. 
# * Otherwise, it'll DUPE=FALSE the first bucket-inaccurate fix/row, and mark the rest as DUPE=TRUE.

perc.bucketed$dupe <- duplicated(perc.bucketed$uniquecheck)

# Now, we remove all of the ones marked duplicated.  This means that the only thing left is one row per bucket per trial, which is bucketacc == 1 if there was an accurate fix at that bucket, and 0 otherwise.

perc.bucketed <- subset(perc.bucketed, dupe == FALSE)

# Now, the bucketing is done!

### Final Perc Data Prep

# Let's generate millisecond-numbered buckets

perc.bucketed$start.bucket.ms <- 20*(perc.bucketed$start.bucket)
perc.bucketed$end.bucket.ms <- 20*(perc.bucketed$end.bucket)
perc.bucketed$bucket.ms <- 20*(perc.bucketed$bucket)

# We'll do some re-names for ease of access.

pb <- perc.bucketed

###################
# Airflow Data Prep
###################

# Now, let's prepare the production data.  First, we load in the raw data by combining all files in a folder, then combine them.

setwd("prod_data_flow")
file_list <- list.files()
rawflow <- do.call("rbind",lapply(file_list, FUN=function(files){read.table(files, header=TRUE, sep="\t")}))

# Make a copy to work on
flow <- rawflow

# Now, let's parse the file names for data.

# Strip some cruft from the filenames
cleanfile <- as.factor(gsub("NVN_", "", flow$filename))
cleanfile <- as.factor(gsub("_flow", "", cleanfile))

# Split the filename by _ to get some info
fileinfo <- data.frame(do.call(rbind, strsplit(as.character(cleanfile), split = "_")))

# Label the columns you've just created by splitting the filename
colnames(fileinfo) <- c("participant", "word", "rep")

# Recombine the split info with the actual files
flow <- cbind(flow,fileinfo)

# Code for structure of the words
flow$structure <- as.factor(ifelse((grepl("NVN",flow$filename)), "NVN", 
                                ifelse((grepl("nt",flow$filename)), "CVNT", 
                                ifelse((grepl("nd",flow$filename)), "CVND",
                                ifelse((grepl("d",flow$filename)), "CVD", 
                                "CVT")))))

# And toss excluded participants
flow <- subset(flow, !(participant %in% dqlist))


# That was fun.  Now, we parse the word itself to identify sonorant onsets, and code that.

flow$onset <- as.factor(ifelse((grepl("l",flow$word)), "son", 
                                ifelse((grepl("w",flow$word)), "son", 
                                ifelse((grepl("NVN",flow$filename)), "nas", 
                                "obs"))))
                                
flow$onsstructure <- as.factor(paste0(flow$structure,"_",flow$onset))


# Google Drive is a joy, and creates lots of duplicated files.  This removes those.

        
flow$dupecheck <- as.factor(ifelse((grepl("__1_",flow$filename)), "1", "0"))

flow <- subset(flow, dupecheck != 1)


# Now let's code for voicing and NVN vs CVX type

flow$voicing <- as.factor(ifelse((grepl("D",flow$structure)), "1", "0"))
                        
flow$type <- as.factor(ifelse((grepl("NVN",flow$structure)), "NVN", 
                                ifelse((grepl("N",flow$structure)), "CVNC", 
                                "CVC")))

### Getting per-speaker flow information

# Now, we need to come up with some information about each speaker.  Let's do that now, by generating a function which, run on a speaker, gives us all this info.

getflowinfo <- function(spk){

    # Get the mean of oral flow
    oralmean <- mean(flow$or_flow_ml[which(flow$participant == spk)])
    oralsd <- sd(flow$or_flow_ml[which(flow$participant == spk)])
    # If it's less than 5, set it to five to avoid tiny ranges
    if (oralsd < 5){
        oralsd <- 5
    }
    
    nasalsd <- sd(flow$nas_flow_ml[which(flow$participant == spk)])
    # If it's less than 5, set it to five to avoid tiny ranges
    if (nasalsd < 5){
        nasalsd <- 5
    }
    # Get the speaker's ymin and ymax for graphing
    ymin <- -1*oralsd
    ymax <- 4*oralsd
    nymin <- -1*nasalsd
    nymax <- 4*nasalsd
    
    # Get flow info during oral vowels
    tempflow <- subset(flow, flow$participant == spk & flow$type == "CVC" & flow$meas_type == "v")
    oralnasflowsd <- sd(tempflow$nas_flow_ml)
    oralnasflowmean <- mean(tempflow$nas_flow_ml)
    
    # Calculate the nasality threshold using this information
    thresh <- oralnasflowmean + 2*oralnasflowsd
    
    # Calculate the airflow for CVC in the middle 50% of the vowel
    tempflow <- subset(flow, flow$participant == spk & flow$type == "CVC" & flow$point > 7 & flow$point < 17)
    middlefiftyoralmean <- mean(tempflow$nas_flow_ml)

    # Calculate the airflow for NVN in the outer 50% of the vowel
    tempflow <- subset(flow, flow$participant == spk & flow$type == "NVN" & flow$point >= 7)
    mean1 <- mean(tempflow$nas_flow_ml)
    tempflow <- subset(flow, flow$participant == spk & flow$type == "NVN" & flow$point <= 17)
    mean2 <- mean(tempflow$nas_flow_ml)
    outerfiftynasalmean <- mean(c(mean1,mean2))
    
    # Get the mean airflow in CVC
    tempflow <- subset(flow, flow$participant == spk & flow$type == "CVC")
    oralmean <- mean(tempflow$nas_flow_ml)

    # Get the mean airflow in NVN
    tempflow <- subset(flow, flow$participant == spk & flow$type == "NVN")
    nasalmean <- mean(tempflow$nas_flow_ml)
    
    # Get the mean airflow in CVT, and generate a matching threshold
    tempflow <- subset(flow, flow$participant == spk & flow$structure == "CVT" & flow$meas_type == "v")
    oralnasflowsd <- sd(tempflow$nas_flow_ml)
    oralnasflowmean <- mean(tempflow$nas_flow_ml)
    # Calculate the nasality threshold 
    cvtthresh <- oralnasflowmean + 2*oralnasflowsd

  # Get the mean airflow in CVT, and generate a matching threshold  
    tempflow <- subset(flow, flow$participant == spk & flow$structure == "CVD" & flow$meas_type == "v")
    oralnasflowsd <- sd(tempflow$nas_flow_ml)
    oralnasflowmean <- mean(tempflow$nas_flow_ml)
    # Calculate the nasality threshold 
    cvdthresh <- oralnasflowmean + 2*oralnasflowsd
    
    # Save all of this as a list
    infolist <- list("participant" = spk, "oralmean" = oralmean, "oralsd" = oralsd, "ymin" = ymin, "ymax" = ymax,"nymin" = nymin, "nymax" = nymax, "nasalthreshold" = thresh,"cvtthresh" = cvtthresh,"cvdthresh" = cvdthresh,"cvcnasmean.mid"= middlefiftyoralmean,"nvnnasmean.mid"=outerfiftynasalmean)
    return(infolist)    
}

# Now, we make a list of all of our speakers and apply that function to each speaker, using mclapply for multi-core speed.

speaks <- unique(flow$participant)
infolist <- mclapply(speaks,getflowinfo)

# This creates a list with all the per-speaker information, indexed by speaker.  Let's re-combine that with the data-frame, using "participant" to unite the two.

infodf <- do.call(rbind.data.frame,infolist)
flow <- merge(flow,infodf,by='participant')

# Now let's pull the DQed participants (defined above) from production...

flow <- subset(flow, !(participant %in% dqlist))

# Now let's pull the /h/ words from production...

flow <- subset(flow, word != "hit" & word != "hint")

# Flow PCA (All CVNC Data, No NVNs)

# OK, now it's time to pull out the PCA-related data.

# First, we'll do the PCA analysis.  We select the vowel only, and then reduce the number of columns we have to care about...

vowelflow <- subset(flow, point > 0 & point <= 25 & type != "CVC")
vfreduced <- data.frame(nas_flow_ml=vowelflow$nas_flow_ml, point=vowelflow$point, participant=vowelflow$participant)

# Aggregate the data by participant and point.

vf.aggregate <- aggregate(nas_flow_ml ~ participant + point,data=vfreduced,FUN=mean)

# Now reshape this data frame so that we have point in columns for each speaker.

vft <- reshape(vf.aggregate,timevar="point",idvar=c("participant"),direction="wide")

# Now transform the participant name into a row title.  This way, it's out of the way, but still accessible.  This bothers me about R.  Row names are seldom useful, but sometimes crucial.

vfr <- data.frame(vft[,-1], row.names=vft[,1])

# It's PCA time.  Run a PCA on the 25 columns here (one per point), then rename.

all.pca <- prcomp(vfr[1:25],center=TRUE)
eig <- (all.pca$sdev)^2
variance <- eig*100/sum(eig)
variance

# Let's get a PCA scree plot

screeplot(all.pca, type="lines")

# Let's produce Figure 4

    pcmeans <- aggregate(nas_flow_ml ~ point,data=vfreduced,FUN=mean)
    pcmin <- c()
    pcmax <- c()

    point <- c()
    type <- c()
    num <- c()
    for (pcnum in 1:5){
        pcsdev <- all.pca$sdev[pcnum]
        pcmin <- c(pcmin,pcmeans$nas_flow_ml-(pcsdev*all.pca$rotation[,pcnum]))
        pcmax <- c(pcmax,pcmeans$nas_flow_ml+(pcsdev*all.pca$rotation[,pcnum]))
        point <- c(point,rep(c(1:nrow(pcmeans)),1))
      num <- c(num,rep(paste0("PC",pcnum),nrow(pcmeans)))
    }
    
    pcdf <- data.frame(num,point,pcmin,pcmax)

pcm5 <- data.frame(nas_flow_ml=rep(pcmeans$nas_flow_ml,5),point=rep(pcmeans$point,5),num=pcdf$num)
pcdf <- subset(pcdf,num == "PC1" | num=="PC2")  
pcm5 <- subset(pcm5,num == "PC1" | num=="PC2")  
f3 <- ggplot(data=pcdf, 
        aes(x=point)) +
        geom_line(aes(y=pcmax,color="PC Max"),size=2,linetype=2) +
    geom_line(aes(y=pcmin,color="PC Min"),size=2,linetype=2) +
    geom_ribbon(aes(ymin=pcmin,ymax=pcmax),fill="grey",alpha=0.5)+
    geom_line(data=pcmeans,aes(x=point,y=nas_flow_ml,group=1)) + 
    ylab("Nasal Airflow (in ml/sec)") +
        xlab("Normalized Time") + 
        generic_plot +
   scale_colour_manual(name=NULL,values=generictripalette) + scale_fill_manual(name=NULL,values=generictripalette) + generic_plot + 
    theme(legend.position = c(0.1, 0.8),legend.background = element_rect(color = "gray", size = 0.5, linetype = "solid"))+
  facet_grid(.~num)

f3
setwd(wd)
ggsave(file=paste0("figure_4.png"), width=dblw,height=dblh, unit="in",dpi=figdpi)

# OK, now we pull out the by-speaker results:

ppc <- data.frame(all.pca$x)

# ... and we graph the by-speaker results to tell us more about the PCs.

pcranks1 <- data.frame(participant=row.names(ppc[order(ppc$PC1,decreasing=TRUE),]),rank=1:nrow(ppc))
pcranks2 <- data.frame(participant=row.names(ppc[order(ppc$PC2,decreasing=TRUE),]),rank=1:nrow(ppc))
#pcranks3 <- data.frame(participant=row.names(ppc[order(ppc$PC3,decreasing=TRUE),]),rank=1:nrow(ppc))
pcranks <- merge(pcranks1,pcranks2,by="participant")
colnames(pcranks) <- c("participant", "pc1rank", "pc2rank")

pcinfo <- data.frame(participant=rownames(ppc),pc1=ppc$PC1,pc2=ppc$PC2,pc3=ppc$PC3)
pcranks <-merge(pcinfo,pcranks, by="participant")

flow <- merge(flow,pcranks,by="participant")
pb <- merge(pb,pcranks,by="participant")
percraw <- merge(percraw,pcranks,by='participant')

# Get VN Durations

# Read in the data...
setwd(wd)
rawvndurations <- read.table("vn_durations.txt", sep="\t",quote = "",header=TRUE)
vndur <- rawvndurations

# Strip some cruft from the filenames
vndur$remove <- as.factor(ifelse((grepl("__1_",vndur$filename)), "1", "0"))
vndur <- subset(vndur, vndur$remove == 0)
vndur$remove <- as.factor(ifelse((grepl("_isBad",vndur$filename)), "1", "0"))
vndur <- subset(vndur, vndur$remove == 0)

vndur$isflow <- as.factor(ifelse((grepl("_flow",vndur$filename)), "1", "0"))

vndur <- subset(vndur, vndur$label == "v" | vndur$label == "n")
cleanfile <- as.factor(gsub("NVN_", "", vndur$filename))
cleanfile <- as.factor(gsub("_flow", "", cleanfile))

# Split the filename by _ to get some info
fileinfo <- data.frame(do.call(rbind, strsplit(as.character(cleanfile), split = "_")))

# Label the columns you've just created by splitting the filename
colnames(fileinfo) <- c("participant", "word", "rep")


# Recombine the split info with the actual files
vndur <- cbind(vndur,fileinfo)

# Code for structure of the words
vndur$structure <- as.factor(ifelse((grepl("NVN",vndur$filename)), "NVN", 
                                ifelse((grepl("nt",vndur$filename)), "CVNT", 
                                ifelse((grepl("nd",vndur$filename)), "CVND",
                                ifelse((grepl("d",vndur$filename)), "CVD", 
                                "CVT")))))

vndur$onset <- as.factor(ifelse((grepl("l",vndur$word)), "son", 
                                ifelse((grepl("w",vndur$word)), "son", 
                                ifelse((grepl("NVN",vndur$filename)), "nas", 
                                "obs"))))
                                
vndur$onsstructure <- as.factor(paste0(vndur$structure,"_",vndur$onset))

vndur$duration <- 1000*(vndur$end - vndur$start)

vndur$voicing <- as.factor(ifelse((grepl("D",vndur$structure)),"1", "0"))

# Let's pull out the speaker data...

vdurs <- subset(vndur, vndur$label == "v")
ndurs <- subset(vndur, vndur$label == "n")
voicedvdurs <- subset(vndur, vndur$voicing == 1 & vndur$label == "v")
voicelessvdurs <- subset(vndur, vndur$voicing == 0 & vndur$label == "v")
voicedndurs <- subset(vndur, vndur$voicing == 1 & vndur$label == "n")
voicelessndurs <- subset(vndur, vndur$voicing == 0 & vndur$label == "n")

vcvtdurs <- subset(vndur, vndur$structure == "CVT" & vndur$label == "v")
vcvddurs <- subset(vndur, vndur$structure == "CVD" & vndur$label == "v")
vcvnddurs <- subset(vndur, vndur$structure == "CVND" & vndur$label == "v")
vcvntdurs <- subset(vndur, vndur$structure == "CVNT" & vndur$label == "v")

ncvnddurs <- subset(vndur, vndur$structure == "CVND" & vndur$label == "n")
ncvntdurs <- subset(vndur, vndur$structure == "CVNT" & vndur$label == "n")

vagg <- aggregate(duration ~ participant,data=vdurs,FUN=mean)
nagg <- aggregate(duration ~ participant,data=ndurs,FUN=mean)

voicedvagg <- aggregate(duration ~ participant,data=voicedvdurs,FUN=mean)
voicelessvagg <- aggregate(duration ~ participant,data=voicelessvdurs,FUN=mean)
voicednagg <- aggregate(duration ~ participant,data=voicedndurs,FUN=mean)
voicelessnagg <- aggregate(duration ~ participant,data=voicelessndurs,FUN=mean)

vcvtagg <- aggregate(duration ~ participant,data=vcvtdurs,FUN=mean)
vcvdagg <- aggregate(duration ~ participant,data=vcvddurs,FUN=mean)
vcvntagg <- aggregate(duration ~ participant,data=vcvntdurs,FUN=mean)
vcvndagg <- aggregate(duration ~ participant,data=vcvnddurs,FUN=mean)

ncvntagg <- aggregate(duration ~ participant,data=ncvntdurs,FUN=mean)
ncvndagg <- aggregate(duration ~ participant,data=ncvnddurs,FUN=mean)

spkdurdata <- data.frame(participant=vagg$participant,avg_vdur=vagg$duration,avg_ndur=nagg$duration,avg_cvt_vdur=vcvtagg$duration,avg_cvd_vdur=vcvdagg$duration,avg_cvnt_vdur=vcvntagg$duration,avg_cvnd_vdur=vcvndagg$duration,avg_cvnt_ndur=ncvntagg$duration,avg_cvnd_ndur=ncvndagg$duration,avg_voiced_vdur=voicedvagg$duration,avg_voiced_ndur=voicednagg$duration,avg_vless_vdur=voicelessvagg$duration,avg_vless_ndur=voicelessnagg$duration)

spkdurdata$vnratio <- spkdurdata$avg_vdur/spkdurdata$avg_ndur
spkdurdata$vnratio_voiced <- spkdurdata$avg_vless_vdur/spkdurdata$avg_vless_ndur
spkdurdata$vnratio_voiceless <- spkdurdata$avg_voiced_vdur/spkdurdata$avg_voiced_ndur

vfldur <- aggregate(duration ~ isflow*participant,data=vdurs,FUN=mean)
nfldur <- aggregate(duration ~ isflow*participant,data=ndurs,FUN=mean)
vnflow <- data.frame(participant=vfldur$participant,isflow=vfldur$isflow,avg_vdur=vfldur$duration,avg_ndur=nfldur$duration)

aggregate(duration ~ isflow*label,data=vndur,FUN=mean)
aggregate(duration ~ voicing*label,data=vndur,FUN=mean)

# Now let's merge it with the actual data...
pb <- merge(pb,spkdurdata,by='participant')
percraw <- merge(percraw,spkdurdata,by='participant')

# Data Output

# Finally, let's save an image of all of this prepared data, for faster loading later.  If we want to re-run all this in the future, just delete this file:

setwd(wd)
save(pb,percraw,flow,pcinfo,file="language2017_data.RData")

# And move back into the main working directory:

setwd(wd)

Loading Data

# Load the Data
load("language2017_data.RData")
pb$voicing <- as.factor(ifelse(pb$voicing == 0,"Voiceless","Voiced"))

Production Statistics

Production LMER with speaker

lmerdata <- subset(flow,type == "CVNC" & point > 0 & point <= 25)
plmer <- lmer(nas_flow_ml ~ bs(point,df=3)*voicing + (1|participant),data=lmerdata)

  print(summary(plmer))
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [lmerMod]
## Formula: nas_flow_ml ~ bs(point, df = 3) * voicing + (1 | participant)
##    Data: lmerdata
## 
## REML criterion at convergence: 499966.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.9621 -0.6123 -0.0804  0.5047 11.6920 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  participant (Intercept) 1.423    1.193   
##  Residual                3.105    1.762   
## Number of obs: 125825, groups:  participant, 42
## 
## Fixed effects:
##                                 Estimate   Std. Error           df t value
## (Intercept)                      0.95113      0.18583     43.00000   5.118
## bs(point, df = 3)1               0.59198      0.07486 125776.00000   7.907
## bs(point, df = 3)2              -2.21324      0.05089 125776.00000 -43.495
## bs(point, df = 3)3               4.87415      0.03903 125776.00000 124.896
## voicing1                        -0.26794      0.03457 125776.00000  -7.752
## bs(point, df = 3)1:voicing1      0.32353      0.10186 125776.00000   3.176
## bs(point, df = 3)2:voicing1      1.32004      0.06923 125776.00000  19.067
## bs(point, df = 3)3:voicing1     -0.45337      0.05310 125776.00000  -8.539
##                                         Pr(>|t|)    
## (Intercept)                  0.00000703389195600 ***
## bs(point, df = 3)1           0.00000000000000266 ***
## bs(point, df = 3)2          < 0.0000000000000002 ***
## bs(point, df = 3)3          < 0.0000000000000002 ***
## voicing1                     0.00000000000000910 ***
## bs(point, df = 3)1:voicing1              0.00149 ** 
## bs(point, df = 3)2:voicing1 < 0.0000000000000002 ***
## bs(point, df = 3)3:voicing1 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) bs(,d=3)1 bs(,d=3)2 bs(,d=3)3 vocng1 b(,d=3)1:
## bs(pn,d=3)1 -0.113                                               
## bs(pn,d=3)2 -0.007 -0.393                                        
## bs(pn,d=3)3 -0.105  0.777    -0.376                              
## voicing1    -0.100  0.609     0.038     0.565                    
## bs(,d=3)1:1  0.083 -0.735     0.289    -0.571    -0.828          
## bs(,d=3)2:1  0.005  0.289    -0.735     0.276    -0.052 -0.393   
## bs(,d=3)3:1  0.077 -0.571     0.276    -0.735    -0.768  0.777   
##             b(,d=3)2:
## bs(pn,d=3)1          
## bs(pn,d=3)2          
## bs(pn,d=3)3          
## voicing1             
## bs(,d=3)1:1          
## bs(,d=3)2:1          
## bs(,d=3)3:1 -0.376

mcmcglmm Modeling code

The below code is intended to be run as a separate script, which requires ~24 hours to run on a very fast OS X or Unix machine with > 7 cores available, and which has the language2017_data.RData file in the same folder as this script. When run, it will produce the completed analysis files which are read in in subsequent sections.

Please note that, because this is a Bayesian model which relies on entropy to fit parameters, the exact parameters arrived at using the below code may not be identical to those reported in the paper, and some marginal parameters may flit in and out of significance. This is particularly true given the complexity of time’s representation here, where the same curve can be represented using B-splines as “high intercept, low control points” or “low intercept, high control points”. When this happens, other parameters will change accordingly, although the predicted curves (e.g. Figure 11) remain the same.

# For Bayesian LMER
library(MCMCglmm)
# For parallel processing
library(parallel)
# For handling curvaceous Data
library(splines)

nitt_num = 150000
thin_num = 500
burnin_num = 5000

load("language2017_data.RData")

# Work on only the last 800ms of the trial
pb <- subset(pb, bucket.ms > 200)

###### EARLY LATE MODEL

spldf= 3
lmerdata <- subset(pb,type == "CVNC" & nasalization != "del_nas" & comptype == "CVC")

prior <- list(R=list(V=1, n=0.0001, fix = 1),
              G=list(G1=list(V=diag(6), n=0.0001),
                     G2=list(V=1, n=0.0001)))

perc_earlylate_revised <- mclapply(1:6, function(i) {
MCMCglmm(bucketacc ~ bs(bucket.ms,df=3)*nasalization*voicing, random=~idh(1+nasalization+bs(bucket.ms,df=3)):word+participant,
        data=lmerdata,
        family="categorical",
        thin   = thin_num,
        burnin = burnin_num,
        nitt   = nitt_num,
        prior=prior)}, mc.cores=6)

save(perc_earlylate_revised, file = "perc_earlylate_revised.RData")
rm(perc_earlylate_revised)
gc()
    

prodperc_earlylate_revised <- mclapply(1:6, function(i) {
  MCMCglmm(bucketacc ~ bs(bucket.ms,df=3)*nasalization*pc2*voicing, random=~idh(1+nasalization+bs(bucket.ms,df=3)):word+participant,
           data=lmerdata,
           family="categorical",
           thin   = thin_num,
           burnin = burnin_num,
           nitt   = nitt_num,
           prior=prior)}, mc.cores=6)

save(prodperc_earlylate_revised, file = "prodperc_earlylate_revised.RData")
rm(prodperc_earlylate_revised)
gc()


# DELETED NASALS MODEL

lmerdata <- subset(pb,type == "CVNC" & nasalization == "del_nas" & comptype == "CVC")

prior <- list(R=list(V=1, n=0.0001, fix = 1),
              G=list(G1=list(V=diag(4), n=0.0001),
                     G2=list(V=1, n=0.0001)))

perc_delnas_revised <- mclapply(1:6, function(i) {
  MCMCglmm(bucketacc ~ bs(bucket.ms,df=3)*voicing, random=~idh(1+bs(bucket.ms,df=3)):word+participant,
           data=lmerdata,
           family="categorical",
           thin   = thin_num,
           burnin = burnin_num,
           nitt   = nitt_num,
           prior=prior)}, mc.cores=6)

save(perc_delnas_revised, file = "perc_delnas_revised.RData")
rm(perc_delnas_revised)
gc()

prodperc_delnas_revised <- mclapply(1:6, function(i) {
  MCMCglmm(bucketacc ~ bs(bucket.ms,df=3)*pc2*voicing, random=~idh(1+bs(bucket.ms,df=3)):word+participant,
           data=lmerdata,
           family="categorical",
           thin   = thin_num,
           burnin = burnin_num,
           nitt   = nitt_num,
           prior=prior)}, mc.cores=6)

save(prodperc_delnas_revised, file = "prodperc_delnas_revised.RData")
rm(prodperc_delnas_revised)
gc()

Perception Model Diagnostics

Model Diagnostics Function

These functions just allow us to rapidly work-up the models.

plot.estimates <- function(x) {
  if (class(x) != "summary.mcmc")
    x <- summary(x)
  n <- dim(x$statistics)[1]
  par(mar=c(2, 7, 4, 1))
  plot(x$statistics[,1], n:1,
       yaxt="n", ylab="",
       xlim=range(x$quantiles)*1.2,
       pch=19,
       main="Posterior means and 95% credible intervals")
  grid()
  axis(2, at=n:1, rownames(x$statistics), las=2)
  arrows(x$quantiles[,1], n:1, x$quantiles[,5], n:1, code=0)
  abline(v=0, lty=2)
}

model_workup <- function(modellist) {
  onemod <- modellist[[2]]
  print("FIXED EFFECTS FORMULA")
  print(onemod$Fixed$formula)
  print("RANDOM EFFECTS FORMULA")
  print(onemod$Random$formula)
  model <- lapply(modellist, function(m) m$Sol)
  model <- do.call(mcmc.list, model)
  modsum <- summary(onemod)
  modsols <- modsum$solutions
  modsols[,c(1:3)] <- round(modsols[,c(1:3)],2)
  modwriteup <- data.frame(name=rownames(modsols),beta=modsols[,1],interval=paste0("[",modsols[,2]," : ",modsols[,3],"]"),pval=round(modsols[,5],5))
  rownames(modwriteup) <- NULL
  # Change some names
  modwriteup[1] <- lapply(modwriteup[1], gsub, pattern = "bs(bucket.ms, k = 3)", replacement = "Time – ", fixed = TRUE)
  modwriteup[1] <- lapply(modwriteup[1], gsub, pattern = "(Intercept)", replacement = "Intercept", fixed = TRUE)
  modwriteup[1] <- lapply(modwriteup[1], gsub, pattern = ":", replacement = " * ", fixed = TRUE)
  modwriteup[1] <- lapply(modwriteup[1], gsub, pattern = "voicing1", replacement = "Voicing (Voiced)", fixed = TRUE)
  modwriteup[1] <- lapply(modwriteup[1], gsub, pattern = "nasalizationearly", replacement = "Nasalization (Early)", fixed = TRUE)
  modwriteup[1] <- lapply(modwriteup[1], gsub, pattern = "pc", replacement = "PC", fixed = TRUE)
  modsig <- subset(modwriteup,pval<0.05)
  modsig$pval <- ifelse(modsig$pval<0.001,"<.001",modsig$pval)

  print(modsum)
  print(data.frame(modwriteup))
  modname <- paste0("modeloutput_",deparse(substitute(modellist)),".txt")
  {sink(modname, append=FALSE, split=FALSE)
  print("FIXED EFFECTS FORMULA")
  print(onemod$Fixed$formula)
  print("RANDOM EFFECTS FORMULA")
  print(onemod$Random$formula)
  print(modsum)
  print(summary(model))
  print("CAMERA READY MODEL OUTPUT")
  write.table(modwriteup,row.names = FALSE,sep="\t",quote=FALSE)
  cat("\n")
  print("CAMERA READY MODEL OUTPUT (SIGNIFICANT ONLY)")
  write.table(modsig,row.names = FALSE,sep="\t",quote=FALSE)
  sink()}
  plot.estimates(model)
  # Gelman diagnostics
  par(mfrow=c(4,2), mar=c(2,2,1,2))
  #gelman.plot(model, auto.layout=F)
  print(gelman.diag(model))
}

Perception Model - Auditory CVNC Trials

Model Diagnostics - Perception Model - Auditory CVNC Trials

load("perc_earlylate_revised.RData")
model_workup(perc_earlylate_revised)
## [1] "FIXED EFFECTS FORMULA"
## bucketacc ~ bs(bucket.ms, df = 3) * nasalization * voicing
## <environment: 0x7ff59c5e9d48>
## [1] "RANDOM EFFECTS FORMULA"
## ~idh(1 + nasalization + bs(bucket.ms, df = 3)):word + participant
## <environment: 0x7ff59c5e9d48>
## 
##  Iterations = 5001:149501
##  Thinning interval  = 500
##  Sample size  = 290 
## 
##  DIC: 85330.12 
## 
##  G-structure:  ~idh(1 + nasalization + bs(bucket.ms, df = 3)):word
## 
##                             post.mean   l-95% CI u-95% CI eff.samp
## (Intercept).word              0.71517 0.11511417   1.6492      290
## nasalizationearly.word        0.14363 0.00002036   0.4240      290
## nasalizationlate.word         0.08705 0.00001975   0.3256      290
## bs(bucket.ms, df = 3)1.word  10.06393 2.76686676  24.3911      290
## bs(bucket.ms, df = 3)2.word   1.42938 0.32096024   2.8808      290
## bs(bucket.ms, df = 3)3.word   1.13279 0.26108007   2.8777      290
## 
##                ~participant
## 
##             post.mean l-95% CI u-95% CI eff.samp
## participant     1.428   0.9062    2.092      290
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units         1        1        1        0
## 
##  Location effects: bucketacc ~ bs(bucket.ms, df = 3) * nasalization * voicing 
## 
##                                                   post.mean l-95% CI
## (Intercept)                                        -6.29784 -7.28714
## bs(bucket.ms, df = 3)1                              4.32589  1.84325
## bs(bucket.ms, df = 3)2                             10.31221  9.14344
## bs(bucket.ms, df = 3)3                              9.37604  8.36572
## nasalizationearly                                  -0.61246 -1.19442
## voicing1                                            0.46086 -0.72983
## bs(bucket.ms, df = 3)1:nasalizationearly            1.81289  0.91117
## bs(bucket.ms, df = 3)2:nasalizationearly            0.16711 -0.26869
## bs(bucket.ms, df = 3)3:nasalizationearly            1.03086  0.54436
## bs(bucket.ms, df = 3)1:voicing1                    -2.30955 -6.21189
## bs(bucket.ms, df = 3)2:voicing1                     0.19247 -1.25684
## bs(bucket.ms, df = 3)3:voicing1                    -0.20294 -1.63526
## nasalizationearly:voicing1                         -0.64841 -1.55569
## bs(bucket.ms, df = 3)1:nasalizationearly:voicing1   1.30531 -0.15802
## bs(bucket.ms, df = 3)2:nasalizationearly:voicing1   0.81584  0.20172
## bs(bucket.ms, df = 3)3:nasalizationearly:voicing1   0.50015 -0.40266
##                                                   u-95% CI eff.samp  pMCMC
## (Intercept)                                       -5.42749    462.7 <0.003
## bs(bucket.ms, df = 3)1                             7.33599    290.0 <0.003
## bs(bucket.ms, df = 3)2                            11.50519    290.0 <0.003
## bs(bucket.ms, df = 3)3                            10.43772    385.3 <0.003
## nasalizationearly                                  0.03244    221.2 0.0414
## voicing1                                           1.70558    290.0 0.3448
## bs(bucket.ms, df = 3)1:nasalizationearly           2.87699    185.8 0.0069
## bs(bucket.ms, df = 3)2:nasalizationearly           0.56395    290.0 0.4690
## bs(bucket.ms, df = 3)3:nasalizationearly           1.67367    217.2 0.0069
## bs(bucket.ms, df = 3)1:voicing1                    1.89508    290.0 0.2759
## bs(bucket.ms, df = 3)2:voicing1                    2.15919    290.0 0.8897
## bs(bucket.ms, df = 3)3:voicing1                    1.19115    204.0 0.7379
## nasalizationearly:voicing1                         0.28106    237.2 0.1586
## bs(bucket.ms, df = 3)1:nasalizationearly:voicing1  2.79984    182.8 0.1103
## bs(bucket.ms, df = 3)2:nasalizationearly:voicing1  1.39530    290.0 0.0138
## bs(bucket.ms, df = 3)3:nasalizationearly:voicing1  1.45340    192.3 0.2690
##                                                     
## (Intercept)                                       **
## bs(bucket.ms, df = 3)1                            **
## bs(bucket.ms, df = 3)2                            **
## bs(bucket.ms, df = 3)3                            **
## nasalizationearly                                 * 
## voicing1                                            
## bs(bucket.ms, df = 3)1:nasalizationearly          **
## bs(bucket.ms, df = 3)2:nasalizationearly            
## bs(bucket.ms, df = 3)3:nasalizationearly          **
## bs(bucket.ms, df = 3)1:voicing1                     
## bs(bucket.ms, df = 3)2:voicing1                     
## bs(bucket.ms, df = 3)3:voicing1                     
## nasalizationearly:voicing1                          
## bs(bucket.ms, df = 3)1:nasalizationearly:voicing1   
## bs(bucket.ms, df = 3)2:nasalizationearly:voicing1 * 
## bs(bucket.ms, df = 3)3:nasalizationearly:voicing1   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##                                                                name  beta
## 1                                                         Intercept -6.30
## 2                                            bs(bucket.ms, df = 3)1  4.33
## 3                                            bs(bucket.ms, df = 3)2 10.31
## 4                                            bs(bucket.ms, df = 3)3  9.38
## 5                                              Nasalization (Early) -0.61
## 6                                                  Voicing (Voiced)  0.46
## 7                     bs(bucket.ms, df = 3)1 * Nasalization (Early)  1.81
## 8                     bs(bucket.ms, df = 3)2 * Nasalization (Early)  0.17
## 9                     bs(bucket.ms, df = 3)3 * Nasalization (Early)  1.03
## 10                        bs(bucket.ms, df = 3)1 * Voicing (Voiced) -2.31
## 11                        bs(bucket.ms, df = 3)2 * Voicing (Voiced)  0.19
## 12                        bs(bucket.ms, df = 3)3 * Voicing (Voiced) -0.20
## 13                          Nasalization (Early) * Voicing (Voiced) -0.65
## 14 bs(bucket.ms, df = 3)1 * Nasalization (Early) * Voicing (Voiced)  1.31
## 15 bs(bucket.ms, df = 3)2 * Nasalization (Early) * Voicing (Voiced)  0.82
## 16 bs(bucket.ms, df = 3)3 * Nasalization (Early) * Voicing (Voiced)  0.50
##           interval    pval
## 1  [-7.29 : -5.43] 0.00345
## 2    [1.84 : 7.34] 0.00345
## 3   [9.14 : 11.51] 0.00345
## 4   [8.37 : 10.44] 0.00345
## 5   [-1.19 : 0.03] 0.04138
## 6   [-0.73 : 1.71] 0.34483
## 7    [0.91 : 2.88] 0.00690
## 8   [-0.27 : 0.56] 0.46897
## 9    [0.54 : 1.67] 0.00690
## 10   [-6.21 : 1.9] 0.27586
## 11  [-1.26 : 2.16] 0.88966
## 12  [-1.64 : 1.19] 0.73793
## 13  [-1.56 : 0.28] 0.15862
## 14   [-0.16 : 2.8] 0.11034
## 15     [0.2 : 1.4] 0.01379
## 16   [-0.4 : 1.45] 0.26897

## Potential scale reduction factors:
## 
##                                                   Point est. Upper C.I.
## (Intercept)                                             1.01       1.02
## bs(bucket.ms, df = 3)1                                  1.00       1.01
## bs(bucket.ms, df = 3)2                                  1.01       1.02
## bs(bucket.ms, df = 3)3                                  1.00       1.01
## nasalizationearly                                       1.01       1.02
## voicing1                                                1.00       1.01
## bs(bucket.ms, df = 3)1:nasalizationearly                1.00       1.01
## bs(bucket.ms, df = 3)2:nasalizationearly                1.00       1.01
## bs(bucket.ms, df = 3)3:nasalizationearly                1.00       1.01
## bs(bucket.ms, df = 3)1:voicing1                         1.00       1.00
## bs(bucket.ms, df = 3)2:voicing1                         1.01       1.02
## bs(bucket.ms, df = 3)3:voicing1                         1.00       1.01
## nasalizationearly:voicing1                              1.00       1.01
## bs(bucket.ms, df = 3)1:nasalizationearly:voicing1       1.00       1.01
## bs(bucket.ms, df = 3)2:nasalizationearly:voicing1       1.00       1.01
## bs(bucket.ms, df = 3)3:nasalizationearly:voicing1       1.00       1.01
## 
## Multivariate psrf
## 
## 1.02

Deleted Nasals Perception Model

Model Diagnostics - DelNas Perception Model

load("perc_delnas_revised.RData")
model_workup(perc_delnas_revised)
## [1] "FIXED EFFECTS FORMULA"
## bucketacc ~ bs(bucket.ms, df = 3) * voicing
## <environment: 0x7ff582074d40>
## [1] "RANDOM EFFECTS FORMULA"
## ~idh(1 + bs(bucket.ms, df = 3)):word + participant
## <environment: 0x7ff582074d40>
## 
##  Iterations = 5001:149501
##  Thinning interval  = 500
##  Sample size  = 290 
## 
##  DIC: 54543.26 
## 
##  G-structure:  ~idh(1 + bs(bucket.ms, df = 3)):word
## 
##                             post.mean   l-95% CI u-95% CI eff.samp
## (Intercept).word               0.1822 0.00006097   0.5195    230.2
## bs(bucket.ms, df = 3)1.word    2.8921 0.45718453   6.6775    290.0
## bs(bucket.ms, df = 3)2.word    1.5797 0.34178191   3.8574    296.0
## bs(bucket.ms, df = 3)3.word    1.2089 0.17348960   2.4989    290.0
## 
##                ~participant
## 
##             post.mean l-95% CI u-95% CI eff.samp
## participant     1.105   0.6526    1.654      290
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units         1        1        1        0
## 
##  Location effects: bucketacc ~ bs(bucket.ms, df = 3) * voicing 
## 
##                                 post.mean l-95% CI u-95% CI eff.samp
## (Intercept)                       -6.1553  -6.7912  -5.5380    290.0
## bs(bucket.ms, df = 3)1             4.9369   3.2925   6.9540    288.5
## bs(bucket.ms, df = 3)2             7.3073   6.3432   8.5808    290.0
## bs(bucket.ms, df = 3)3             7.4105   6.3767   8.2779    290.0
## voicing1                           0.1073  -0.6171   0.7214    290.0
## bs(bucket.ms, df = 3)1:voicing1    0.3848  -2.1645   2.6977    290.0
## bs(bucket.ms, df = 3)2:voicing1   -2.4639  -4.0030  -0.9887    468.7
## bs(bucket.ms, df = 3)3:voicing1   -2.0913  -3.5225  -0.7870    290.0
##                                  pMCMC   
## (Intercept)                     <0.003 **
## bs(bucket.ms, df = 3)1          <0.003 **
## bs(bucket.ms, df = 3)2          <0.003 **
## bs(bucket.ms, df = 3)3          <0.003 **
## voicing1                        0.7379   
## bs(bucket.ms, df = 3)1:voicing1 0.7034   
## bs(bucket.ms, df = 3)2:voicing1 <0.003 **
## bs(bucket.ms, df = 3)3:voicing1 0.0069 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##                                        name  beta        interval    pval
## 1                                 Intercept -6.16 [-6.79 : -5.54] 0.00345
## 2                    bs(bucket.ms, df = 3)1  4.94   [3.29 : 6.95] 0.00345
## 3                    bs(bucket.ms, df = 3)2  7.31   [6.34 : 8.58] 0.00345
## 4                    bs(bucket.ms, df = 3)3  7.41   [6.38 : 8.28] 0.00345
## 5                          Voicing (Voiced)  0.11  [-0.62 : 0.72] 0.73793
## 6 bs(bucket.ms, df = 3)1 * Voicing (Voiced)  0.38   [-2.16 : 2.7] 0.70345
## 7 bs(bucket.ms, df = 3)2 * Voicing (Voiced) -2.46    [-4 : -0.99] 0.00345
## 8 bs(bucket.ms, df = 3)3 * Voicing (Voiced) -2.09 [-3.52 : -0.79] 0.00690

## Potential scale reduction factors:
## 
##                                 Point est. Upper C.I.
## (Intercept)                          1.005       1.02
## bs(bucket.ms, df = 3)1               1.003       1.01
## bs(bucket.ms, df = 3)2               0.999       1.00
## bs(bucket.ms, df = 3)3               1.005       1.01
## voicing1                             1.008       1.02
## bs(bucket.ms, df = 3)1:voicing1      1.003       1.01
## bs(bucket.ms, df = 3)2:voicing1      1.002       1.01
## bs(bucket.ms, df = 3)3:voicing1      1.003       1.01
## 
## Multivariate psrf
## 
## 1.01

Production/Perception - Auditory CVNC Trials

Model Diagnostics - Production/Perception - Early vs. Late

load("prodperc_earlylate_revised.RData")
model_workup(prodperc_earlylate_revised)
## [1] "FIXED EFFECTS FORMULA"
## bucketacc ~ bs(bucket.ms, df = 3) * nasalization * pc2 * voicing
## <environment: 0x7ff57edb4e68>
## [1] "RANDOM EFFECTS FORMULA"
## ~idh(1 + nasalization + bs(bucket.ms, df = 3)):word + participant
## <environment: 0x7ff57edb4e68>
## 
##  Iterations = 5001:149501
##  Thinning interval  = 500
##  Sample size  = 290 
## 
##  DIC: 85071.37 
## 
##  G-structure:  ~idh(1 + nasalization + bs(bucket.ms, df = 3)):word
## 
##                             post.mean   l-95% CI u-95% CI eff.samp
## (Intercept).word              0.54876 0.00005128   1.2713    172.8
## nasalizationearly.word        0.15033 0.00002694   0.4038    290.0
## nasalizationlate.word         0.09841 0.00001452   0.4366    290.0
## bs(bucket.ms, df = 3)1.word  11.24622 2.95228775  27.9163    407.0
## bs(bucket.ms, df = 3)2.word   1.40446 0.38333281   3.0419    290.0
## bs(bucket.ms, df = 3)3.word   1.02214 0.22645482   2.6965    290.0
## 
##                ~participant
## 
##             post.mean l-95% CI u-95% CI eff.samp
## participant     1.505   0.8791    2.186      290
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units         1        1        1        0
## 
##  Location effects: bucketacc ~ bs(bucket.ms, df = 3) * nasalization * pc2 * voicing 
## 
##                                                       post.mean  l-95% CI
## (Intercept)                                           -6.915767 -7.743446
## bs(bucket.ms, df = 3)1                                 5.390328  2.859941
## bs(bucket.ms, df = 3)2                                10.682374  9.594544
## bs(bucket.ms, df = 3)3                                 9.986890  8.985427
## nasalizationearly                                     -0.123068 -0.786363
## pc2                                                    0.736164  0.486934
## voicing1                                               0.683618 -0.368237
## bs(bucket.ms, df = 3)1:nasalizationearly               0.872800  0.016458
## bs(bucket.ms, df = 3)2:nasalizationearly              -0.088121 -0.524594
## bs(bucket.ms, df = 3)3:nasalizationearly               0.480548 -0.160025
## bs(bucket.ms, df = 3)1:pc2                            -1.284072 -1.632319
## bs(bucket.ms, df = 3)2:pc2                            -0.384117 -0.551374
## bs(bucket.ms, df = 3)3:pc2                            -0.882897 -1.102285
## nasalizationearly:pc2                                 -0.546837 -0.755204
## bs(bucket.ms, df = 3)1:voicing1                       -2.683812 -6.532643
## bs(bucket.ms, df = 3)2:voicing1                        0.024366 -1.160761
## bs(bucket.ms, df = 3)3:voicing1                       -0.380940 -1.657027
## nasalizationearly:voicing1                            -0.990788 -2.108011
## pc2:voicing1                                          -0.245909 -0.479687
## bs(bucket.ms, df = 3)1:nasalizationearly:pc2           1.214009  0.742549
## bs(bucket.ms, df = 3)2:nasalizationearly:pc2           0.014375 -0.183652
## bs(bucket.ms, df = 3)3:nasalizationearly:pc2           0.732738  0.438997
## bs(bucket.ms, df = 3)1:nasalizationearly:voicing1      2.024977  0.413074
## bs(bucket.ms, df = 3)2:nasalizationearly:voicing1      1.024117  0.395764
## bs(bucket.ms, df = 3)3:nasalizationearly:voicing1      0.888618  0.024289
## bs(bucket.ms, df = 3)1:pc2:voicing1                    0.726158  0.193745
## bs(bucket.ms, df = 3)2:pc2:voicing1                   -0.004039 -0.234465
## bs(bucket.ms, df = 3)3:pc2:voicing1                    0.151787 -0.148907
## nasalizationearly:pc2:voicing1                         0.481567  0.177663
## bs(bucket.ms, df = 3)1:nasalizationearly:pc2:voicing1 -1.183597 -1.885034
## bs(bucket.ms, df = 3)2:nasalizationearly:pc2:voicing1  0.086766 -0.218464
## bs(bucket.ms, df = 3)3:nasalizationearly:pc2:voicing1 -0.590471 -0.981646
##                                                        u-95% CI eff.samp
## (Intercept)                                           -6.049228    206.8
## bs(bucket.ms, df = 3)1                                 8.459577    290.0
## bs(bucket.ms, df = 3)2                                11.668149    290.0
## bs(bucket.ms, df = 3)3                                11.030594    290.0
## nasalizationearly                                      0.587095    226.4
## pc2                                                    0.979514    250.1
## voicing1                                               1.774489    290.0
## bs(bucket.ms, df = 3)1:nasalizationearly               2.115484    214.8
## bs(bucket.ms, df = 3)2:nasalizationearly               0.352476    290.0
## bs(bucket.ms, df = 3)3:nasalizationearly               1.098562    208.9
## bs(bucket.ms, df = 3)1:pc2                            -0.948788    169.3
## bs(bucket.ms, df = 3)2:pc2                            -0.252669    290.0
## bs(bucket.ms, df = 3)3:pc2                            -0.688501    223.0
## nasalizationearly:pc2                                 -0.308128    228.6
## bs(bucket.ms, df = 3)1:voicing1                        1.896107    290.0
## bs(bucket.ms, df = 3)2:voicing1                        1.816883    290.0
## bs(bucket.ms, df = 3)3:voicing1                        0.927967    290.0
## nasalizationearly:voicing1                            -0.019002    209.7
## pc2:voicing1                                          -0.004266    146.9
## bs(bucket.ms, df = 3)1:nasalizationearly:pc2           1.731645    243.7
## bs(bucket.ms, df = 3)2:nasalizationearly:pc2           0.207572    290.0
## bs(bucket.ms, df = 3)3:nasalizationearly:pc2           1.010692    344.3
## bs(bucket.ms, df = 3)1:nasalizationearly:voicing1      3.600006    189.9
## bs(bucket.ms, df = 3)2:nasalizationearly:voicing1      1.697389    229.7
## bs(bucket.ms, df = 3)3:nasalizationearly:voicing1      1.936430    208.2
## bs(bucket.ms, df = 3)1:pc2:voicing1                    1.157456    127.4
## bs(bucket.ms, df = 3)2:pc2:voicing1                    0.207462    290.0
## bs(bucket.ms, df = 3)3:pc2:voicing1                    0.441735    160.1
## nasalizationearly:pc2:voicing1                         0.825056    191.7
## bs(bucket.ms, df = 3)1:nasalizationearly:pc2:voicing1 -0.514573    214.2
## bs(bucket.ms, df = 3)2:nasalizationearly:pc2:voicing1  0.380630    290.0
## bs(bucket.ms, df = 3)3:nasalizationearly:pc2:voicing1 -0.155284    215.6
##                                                        pMCMC   
## (Intercept)                                           <0.003 **
## bs(bucket.ms, df = 3)1                                0.0138 * 
## bs(bucket.ms, df = 3)2                                <0.003 **
## bs(bucket.ms, df = 3)3                                <0.003 **
## nasalizationearly                                     0.7793   
## pc2                                                   <0.003 **
## voicing1                                              0.1862   
## bs(bucket.ms, df = 3)1:nasalizationearly              0.0897 . 
## bs(bucket.ms, df = 3)2:nasalizationearly              0.6759   
## bs(bucket.ms, df = 3)3:nasalizationearly              0.1172   
## bs(bucket.ms, df = 3)1:pc2                            <0.003 **
## bs(bucket.ms, df = 3)2:pc2                            <0.003 **
## bs(bucket.ms, df = 3)3:pc2                            <0.003 **
## nasalizationearly:pc2                                 <0.003 **
## bs(bucket.ms, df = 3)1:voicing1                       0.1862   
## bs(bucket.ms, df = 3)2:voicing1                       0.9655   
## bs(bucket.ms, df = 3)3:voicing1                       0.5586   
## nasalizationearly:voicing1                            0.0552 . 
## pc2:voicing1                                          0.0483 * 
## bs(bucket.ms, df = 3)1:nasalizationearly:pc2          <0.003 **
## bs(bucket.ms, df = 3)2:nasalizationearly:pc2          0.9310   
## bs(bucket.ms, df = 3)3:nasalizationearly:pc2          <0.003 **
## bs(bucket.ms, df = 3)1:nasalizationearly:voicing1     0.0138 * 
## bs(bucket.ms, df = 3)2:nasalizationearly:voicing1     0.0069 **
## bs(bucket.ms, df = 3)3:nasalizationearly:voicing1     0.0690 . 
## bs(bucket.ms, df = 3)1:pc2:voicing1                   0.0069 **
## bs(bucket.ms, df = 3)2:pc2:voicing1                   0.9655   
## bs(bucket.ms, df = 3)3:pc2:voicing1                   0.3379   
## nasalizationearly:pc2:voicing1                        <0.003 **
## bs(bucket.ms, df = 3)1:nasalizationearly:pc2:voicing1 <0.003 **
## bs(bucket.ms, df = 3)2:nasalizationearly:pc2:voicing1 0.5379   
## bs(bucket.ms, df = 3)3:nasalizationearly:pc2:voicing1 0.0069 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##                                                                      name
## 1                                                               Intercept
## 2                                                  bs(bucket.ms, df = 3)1
## 3                                                  bs(bucket.ms, df = 3)2
## 4                                                  bs(bucket.ms, df = 3)3
## 5                                                    Nasalization (Early)
## 6                                                                     PC2
## 7                                                        Voicing (Voiced)
## 8                           bs(bucket.ms, df = 3)1 * Nasalization (Early)
## 9                           bs(bucket.ms, df = 3)2 * Nasalization (Early)
## 10                          bs(bucket.ms, df = 3)3 * Nasalization (Early)
## 11                                           bs(bucket.ms, df = 3)1 * PC2
## 12                                           bs(bucket.ms, df = 3)2 * PC2
## 13                                           bs(bucket.ms, df = 3)3 * PC2
## 14                                             Nasalization (Early) * PC2
## 15                              bs(bucket.ms, df = 3)1 * Voicing (Voiced)
## 16                              bs(bucket.ms, df = 3)2 * Voicing (Voiced)
## 17                              bs(bucket.ms, df = 3)3 * Voicing (Voiced)
## 18                                Nasalization (Early) * Voicing (Voiced)
## 19                                                 PC2 * Voicing (Voiced)
## 20                    bs(bucket.ms, df = 3)1 * Nasalization (Early) * PC2
## 21                    bs(bucket.ms, df = 3)2 * Nasalization (Early) * PC2
## 22                    bs(bucket.ms, df = 3)3 * Nasalization (Early) * PC2
## 23       bs(bucket.ms, df = 3)1 * Nasalization (Early) * Voicing (Voiced)
## 24       bs(bucket.ms, df = 3)2 * Nasalization (Early) * Voicing (Voiced)
## 25       bs(bucket.ms, df = 3)3 * Nasalization (Early) * Voicing (Voiced)
## 26                        bs(bucket.ms, df = 3)1 * PC2 * Voicing (Voiced)
## 27                        bs(bucket.ms, df = 3)2 * PC2 * Voicing (Voiced)
## 28                        bs(bucket.ms, df = 3)3 * PC2 * Voicing (Voiced)
## 29                          Nasalization (Early) * PC2 * Voicing (Voiced)
## 30 bs(bucket.ms, df = 3)1 * Nasalization (Early) * PC2 * Voicing (Voiced)
## 31 bs(bucket.ms, df = 3)2 * Nasalization (Early) * PC2 * Voicing (Voiced)
## 32 bs(bucket.ms, df = 3)3 * Nasalization (Early) * PC2 * Voicing (Voiced)
##     beta        interval    pval
## 1  -6.92 [-7.74 : -6.05] 0.00345
## 2   5.39   [2.86 : 8.46] 0.01379
## 3  10.68  [9.59 : 11.67] 0.00345
## 4   9.99  [8.99 : 11.03] 0.00345
## 5  -0.12  [-0.79 : 0.59] 0.77931
## 6   0.74   [0.49 : 0.98] 0.00345
## 7   0.68  [-0.37 : 1.77] 0.18621
## 8   0.87   [0.02 : 2.12] 0.08966
## 9  -0.09  [-0.52 : 0.35] 0.67586
## 10  0.48   [-0.16 : 1.1] 0.11724
## 11 -1.28 [-1.63 : -0.95] 0.00345
## 12 -0.38 [-0.55 : -0.25] 0.00345
## 13 -0.88  [-1.1 : -0.69] 0.00345
## 14 -0.55 [-0.76 : -0.31] 0.00345
## 15 -2.68   [-6.53 : 1.9] 0.18621
## 16  0.02  [-1.16 : 1.82] 0.96552
## 17 -0.38  [-1.66 : 0.93] 0.55862
## 18 -0.99 [-2.11 : -0.02] 0.05517
## 19 -0.25     [-0.48 : 0] 0.04828
## 20  1.21   [0.74 : 1.73] 0.00345
## 21  0.01  [-0.18 : 0.21] 0.93103
## 22  0.73   [0.44 : 1.01] 0.00345
## 23  2.02    [0.41 : 3.6] 0.01379
## 24  1.02     [0.4 : 1.7] 0.00690
## 25  0.89   [0.02 : 1.94] 0.06897
## 26  0.73   [0.19 : 1.16] 0.00690
## 27  0.00  [-0.23 : 0.21] 0.96552
## 28  0.15  [-0.15 : 0.44] 0.33793
## 29  0.48   [0.18 : 0.83] 0.00345
## 30 -1.18 [-1.89 : -0.51] 0.00345
## 31  0.09  [-0.22 : 0.38] 0.53793
## 32 -0.59 [-0.98 : -0.16] 0.00690