Lancaster Stats Tools
Welcome to Lancaster Stats Tools
Do you use corpora in your research or study, but find that you struggle with statistics? This
practical introduction
will equip you to understand the key principles of statistical thinking and
apply these concepts to your own research, without the need for prior statistical knowledge. The
book gives step-by-step guidance through the process of statistical analysis and provides multiple
examples of how statistical techniques can be used to analyse and visualise linguistic data. It also
includes a useful selection of discussion questions and exercises which you can use to check your
understanding.
Lancaster Stats Tools is a companion website to the book. It contains additional materials (video
lectures, exercises, data, and slides and lesson plans) as well as easy-to-use tools for calculating
statistics and producing graphs.
Enter a mathematical expression, to be evaluated in R.
Help & Docs
Generate a graph
Help & Docs
R code
Generate randomized data
Help & Docs
Example data
Concordance for the lemma GO
[csv]
[xlsx]
'The' in BE06
[csv]
[xlsx]
Passives in BE06 - genres
[csv]
[xlsx]
'The' and 'I' in BNC64
[csv]
[xlsx]
'Go'/'travel' in BNC
[csv]
[xlsx]
'Lovely' in BNC64: Male and female speech
[csv]
[xlsx]
'Lovely' in BNC64: Age
[csv]
[xlsx]
Modals in the Brown family - frequencies
[csv]
[xlsx]
Modals in the Brown family - concordances
[csv]
[xlsx]
Modals in the Brown family - summary
[csv]
[xlsx]
Data visualization
[xlsx]
Exercises
Open in new tab
Answers
Open in new tab
Text analysis
Create a wordlist
Calculate dispersion
Calculate ARF
Example data
BNC frequency list (BNCweb)
[txt]
DP calculations
[xlsx]
Ex 7 - Table
[xlsx]
Exercises
Open in new tab
Answers
Calculate collocations
Help & Docs
Agreement calculator
Help & Docs
R code
#LancsBox X is a multi-platform tool for analysing language.
#LancsBox can identify collocations and keywords, among other things. Unfortunately it's not
available on the web, so you'll need to download it to your computer for free.
Find out more about #LancsBox X here
Example data
Inter-rater agreement (exercise 9)
[csv]
[xlsx]
Inter-rater agreement (example)
[csv]
[xlsx]
Guardian comments
[txt]
Daily Mail comments
[txt]
Exercises
Open in new tab
Answers
Agreement R code
R functions:
#nominal
gwet.ac1.raw(myData)
myData1<-table(myData); kappa2.table(myData1)
#nominal 3 + raters
fleiss.kappa.raw(myData)
#ordinal
gwet.ac1.raw(myData, weights="ordinal")
#scale
ICC(myData)
Cross-tabulate data
Help & Docs
Compare categories
Help & Docs
R code
Build a model
Help & Docs
Example data
The vs. a(n) dataset
[csv]
[xlsx]
Modals dataset
[csv]
[xlsx]
Modals dataset with genre coding
[csv]
[xlsx]
Modals dataset with variety coding
[csv]
[xlsx]
Cross-tab of modals of obligation
[csv]
[xlsx]
Cross-tab of which and that
[csv]
[xlsx]
Which and that dataset for logistic regression
[csv]
[xlsx]
Exercises
Open in new tab
Answers
Category comparison R code
source("
");
source("
");
source("
");
#cross-tabulate
data<- table(data)
#statistical tests
chisq.test(data, correct = FALSE);
chisq.test(data, correct = TRUE);
g.test(data, correct = "none");
fisher.test(data);
#effect sizes
CramerV(data, conf.level = 0.95);
riskratio(data);
Correlation calculator
Help & Docs
R code
Cluster data
Help & Docs
R code
Multidimensional analysis
Help & Docs
R code
Example data
Correlations
[csv]
[xlsx]
Clusters
[csv]
[xlsx]
MD BE06 (British English)
[csv]
[xlsx]
MD AmE06 (American English)
[csv]
[xlsx]
New Zealand English - ICE-NZ
[xlsx]
Exercises
Open in new tab
Answers
Correlation R code
library(Hmisc); library(corrplot); library(stats); #libraries used
cor.test(mydata1, mydata2, method="pearson") #Pearson's correlation
cor.test(mydata1, mydata2, method="spearman") #Spearman's correlation
rcorr(mydata, type="pearson") #correlation matrix
plot(mydata, col ="blue"); fitline <- lm(mydata1 ~ mydata2); abline(fitline,col="red") #scatter plot
corrplot(m, method ="color", type = "full", diag = TRUE, addCoef.col="black", addCoefasPercent=FALSE, addgrid.col="grey", tl.pos = NULL, tl.cex = 1, tl.srt = 45, tl.col = "black") #correlation matrix
Cluster R code
mydata <- scale(mydata) # optional z-score transformation
d <- dist(mydata, method = "manhattan") # distance matrix
fit <- hclust(d, method="ward.D") #Cluster analysis
plot(fit, xlab="", ylab="Height", main="")#plot dendrogram
rect.hclust(fit, k=5, border="red") #draw cluster groups
MD analysis R code
cortest.bartlett(mydata); det(cor(mydata))# Bartlett's test and multi-colinearity test
fa.parallel(mydata, fa="fa", main = "Scree Plot", show.legend=FALSE) #screeplot
factanal(mydata, number, rotation="promax") #factor analysis
Group comparison calculator
Help & Docs
R code
Correspondence analysis chart
Help & Docs
R code
Mixed effect logistic regression
Help & Docs
R code
Example data
T-test or Mann-Whitney U test
[csv]
[xlsx]
ANOVA or Kruskal-Wallis
[csv]
[xlsx]
Correspondence analysis
[csv]
[xlsx]
Mixed effect model
[csv]
[xlsx]
White House Press Conferences
[csv]
[xlsx]
Exercise 6
[xlsx]
Open BNC64 in a new tab
Exercises
Open in new tab
Answers
Group comparison R code
#t-test
t.test(data[ ,1], data[ ,2], paired=FALSE)
#t-test: repeated measures
t.test(data[ ,1], data[ ,2], paired=TRUE)
#Mann-Whitney-wilcoxon rank sum test
wilcox.test(data[ ,1], data[ ,2], paired=FALSE)
#Mann-Whitney-Wilcoxon rank sum test: repeated measures
wilcox.test(data[ ,1], data[ ,2], paired=TRUE)
#One-way ANOVA
aov(measurement ~ group, data = data)
#Kruskal-Wallis test
kruskal.test(data)
Correspondence R code
library(languageR);
x = corres.fnc(data);
plot(x, ccex = 0.6, rcex = 0.6);
Mixed Effect Logistic Regression R code
library(lme4);
glmer(outcome~predictor+(1|randeffect), family = binomial, data = mydata);
Bootstrapping test
Help & Docs
R code
Neighbour clusters
Help & Docs
R code
Peaks and Troughs
Help & Docs
R code
Usage fluctuation analysis
Help & Docs
R code
Example data
Modals in BrE 1931 - 2006
[csv]
[xlsx]
Bootstrap test data
[csv]
[xlsx]
VNC cluster data
[csv]
[xlsx]
Peaks & troughs data
[csv]
[xlsx]
UFA data: 'whore' in the 17th century
[zip]
Colours - dataset
[xlsx]
Exercises
Open in new tab
Answers
Bootstrapping R code
library(boot);
source("
");
bootstraptest(period1, period2,samples,'p2');
percid(b);
boot(data=b, statistic=percid, R=samples);
Neighbour R code
source("
");
abc(data,"sd");
Peaks and Troughs R code
library(ggplot2)
library(mgcv)
p<-ggplot(data, aes(x = data[,1], y =data[,2])) + geom_point() + xladata("Time") + yladata("Linguistic variadatale"); p + stat_smooth(method = "gam", formula = y ~ s(x, datas = "cr", fx=FALSE, k =15), size = 1, fill="#707070", level = 0.95 )+ stat_smooth(method = "gam", formula = y ~ s(x, datas = "cr", fx=FALSE, k =15), size = 1, fill="#FFFF00", level = 0.99);
UFA R code
#Calculate Gwet's AC1; b...input data frame
i = 1; v <- c(); while(i+1 < ncol(b)) {n=(gwet.ac1.raw(b[,i:(i+2)])[3]);v<- c(v, n); i= i+1; }

#Prepare data frame
h<-seq(from, to, by = 1); g<-data.frame(h,v)

#Produce graph
p<-ggplot(g, aes(x = g[,1], y =g[,2])) + xlim(from, to)+ scale_x_continuous(breaks = seq(from, to, by = 10)) + geom_point() + xlab("Time") + ylab("AC1"); p + stat_smooth(method = "gam", formula = y ~ s(x, bs = "cr", fx=FALSE, k =10), size = 1, fill="#707070", level = 0.95 )+ stat_smooth(method = "gam", formula = y ~ s(x, bs = "cr", fx=FALSE, k =10), size = 1, fill="#FFFF00", level = 0.99)
Effect size calculator
Help & Docs
R code
Meta-analysis calculator
Help & Docs
R code
Example data
Meta-analysis
[csv]
[xlsx]
Exercises
Open in new tab
Answers
Effect size R code
library(compute.es)
pes(p,n1,n2) #based on p-value
mes(m1,m2,sd1,sd2,n1,n2) #based on means
tes(t,n1,n2) #based on t-value (t-test)
fes(F,n1,n2) #based on F (ANOVA)
res(r,NULL,n) #based on r (e.g. correlation)
des(d,n1,n2) #based on Cohen's d
lores(lor,var,n1,n2) #based on Log Oddds Ratio
pes(p,n1,n2) #based on p-value
d=(2*r)/sqrt(1-(r*r)) #based on r only
r=d/sqrt((d*d)+4) #based on Cohen's d only
d=(2*sqrt(e))/sqrt(1-e) #based on eta2
d=(lor*sqrt(3))/pi #based on Log Odds Ratio only
Meta-analysis R code
library(meta)
#Calculate Variance ES
es.d.v <-(((n1+n2)/(n1*n2))+(es.d^2/(2*(n1+n2))))
#Calculate Standard Errors ES
d.se<-sqrt(es.d.v)
meta1<-metagen(es.d, d.se)
forest(meta1, studlab=c("Study1","Study2","Study3","Study4","Study5"), xlab="Cohen’s d", col.square="black",xlim=c(-3,3), col.diamond="black", fontsize=14, squaresize=0.5, leftcols=c("studlab"), rightcols=c("effect", "ci"), hetstat=FALSE, comb.fixed=FALSE, text.random="Overall ES", print.tau2=FALSE,print.I2=FALSE,TE.random=FALSE, seTE.random=FALSE)
Copy link to tool
Video Lectures
Watch instructional videos about statistics and why it matters in language and everyday life.
Watch lectures
Slides
Download pptx slides on a range of statistical topics related to each of the eight chapters in the book.
View downloads
Lesson Plan
Download lesson plans for teachers related to each of the eight chapters in the book.
View downloads
Readings
Explore corpus statistics through additional readings.
Explore readings