UNIVERSITY OF MARYLAND


This webpage provides the data used and produced in research for the paper, "Did Caselaw Foster England's Economic Development During the Industrial Revolution? Data and Evidence", forthcoming in the Journal of Comparative Economics, 2024. The data and programs used in this paper can be divided into two elements:
1) The estimation of the Structural Topic Model. Full details on this are provided immediately below.
2) The data and methods to estimate the VAR. All estimates and figures were produced using standard EViews routines. We will be adding more information to this page soon, but those wishing to replicate the findings of this paper before then are encouraged to contact the authors right away. We can then directly answer any questions concerning the specifications used in obtaining the estimates or producing the figures.


The Structural Topic Model estimation results reported in this paper were implemented using R's stm package. The first table below contains two R data files generated in the two key stages of data processing and analysis. The second table contains a subset of the most pertinent objects that are extractable directly from the two R data files, but are here presented in more universal data formats.


To replicate any commands listed in the second table, you may have to install and load the stm package using the following code.
install.packages('stm')
library(stm)


Please note that due to package updates, some results may not be exactly identical to the originals. At the time of this site's creation, the 1.3.6 version of stm was used, which is the same version that was used to create the .rda and .rds files.


Key conceptual elements of a topic model
Topic: probability distribution over vocabulary.
Document: probability distribution over topics.

R Data Files Basic Documentation
Processed Data: ER2ProcessedData.rda Output of the prepDocuments method. Includes document-level vocabulary distributions and document metadata.

Contains the following R objects (see the official stm documentation for details):
vocab - a character vector containing the words in alphabetical order that were used in thet statistical analysis
meta - a table containing metadata for every document (i.e. author, year, etc) as well as document texts in a form prior to R processing but after the pre-processing described in the paper.
This means that these are not the original documents.
The STM Model: ER2ProcessedData.rds Output of the stm method. Includes estimated parameters for analysis and topic correlation data.

Contains the following R objects (see the official stm documentation for details):
beta - matrix showing for every word and topic the log probability that a word belongs to a topic
vocab - same as in the .rda file
theta - see "Topic-Proportion Matrix" below
version - the version number of the package with which the model was estimated

NOTE: No use of R is required to use the files in the "Extracted Data + Description" column. The "R Code" column is simply provided to show how experienced users can load these files directly into R.

Extracted Data + Description R Code
Bag of Words: corpusVocab.csv

CSV file containing every word in the corpus not eliminated by the textProcessor method. Includes a corpus-level frequency count as well.
How to obtain from the .rda object:

First load the RDA object (by default its variable name will be "out")
load("/insertYourPathHere/ER2ProcessedData.rda")
bag_of_words <- sort(c(out$vocab, out$words.removed))
corpusVocab <- data.frame(bag_of_words, out$wordcounts)
write.csv(corpusVocab, file="corpusVocab.csv")
The Topics: topics.txt

Words that are most used by each topic, according to four different criteria.
How to obtain from the .rds object:

First load the RDS object with a variable name of your choosing
stmModel <- readRDS("/insertYourPathHere/ER2STMestimates.rds")
sink("topics.txt")
# To run the following command you will need to have installed and loaded the 'stm' package in R
labelTopics(stmModel, n = 30, frexweight = 0.25)
sink()
Topic-Proportion Matrix: mtheta.csv

Matrix of estimated topic prevalences across documents. Rows contain documents and columns contain topics.
How to obtain from the .rds object:

First load the RDS object (if you haven't already)
stmModel <- readRDS("/insertYourPathHere/ER2STMestimates.rds")
mtheta <- stmModel$theta
write.csv(mtheta, file="mtheta.csv")
Topic-Correlation Matrix: topicCorr.csv

Matrix of estimated document-level topic correlations.
How to obtain from the .rds object:

First load the RDS object (if you haven't already)
stmModel <- readRDS("/insertYourPathHere/ER2STMestimates.rds")
# To run the following command you will need to have installed and loaded the 'stm' package in R
topCorr <- topicCorr(stmModel)
write.csv(topCorr$cor, file = "topicCorr.csv")