Sunday, January 17, 2016

Automating R exercises and exams using the exams package

It's a pain to design statistics exercises each semester, and because students from previous share old exercises with the new incoming students, it's hard to design simple exercises that students haven't already seen the answers to. On top of that, some students try to cheat during the exam by looking over the shoulder of their neighbors. Homework exercises almost always involve collaboration even if you prohibit it.

It turns out that you can automate the generation of fixed-format exercises (with different numerical answers being required each time). You can also randomly select questions from a question bank you create yourself. And you can even create a unique question paper for each student in an exam, to make cheating between neighbors essentially impossible (even if they copy the correct answer to question 2 from a neighbor, they end up answering the wrong question on their own paper).

All this magic is made possible by the exams package in R. The documentation is of course comprehensive and there is a journal article explaining everything:
Achim Zeileis, Nikolaus Umlauf, Friedrich Leisch (2014). Flexible Generation of E-Learning Exams in R: Moodle Quizzes, OLAT Assessments, and Beyond. Journal of Statistical Software 58(1), 1-36. URL
I also use this package to deliver auto-graded exercises to students over See here for the course I teach, and here for the datacamp exercises.

Here is a quick example to get people started on designing their own customized, automated exams. In my example below, there are several files you need.

1. The template files for your exam (what your exam or homework sheet will look like), and the solutions file. I provide two example files: test.tex and solutiontest.tex

2. The exercises or exam questions themselves: I provide two as examples. The first file is called pnorm1.Rnw. It's an Sweave file, and it contains the code for generating a random problem and for generating its solution. The code should be self-explanatory. The second file is called sesamplesize1multiplechoice.Rnw and has a multiple choice question.

3.  The exam generating R code file: The code is commented and self-explanatory. It will generate the exercises, randomize the order of presentation (if there are two or more exercises), and generate a solutions file. The output will be a single or multiple exam papers (depending on how many versions you wanted generated), and the solutions file(s).  Notice the cool thing that even in my example, with only one question, the two versions of the exams have different numbers, so two people cannot collaborate and consult each other and just write down one answer.  Each student could in principle be given a unique set of exercises, although it would be a lot of work to grade it if you do it manually.

Here is the exam generating code:

Save from the gists given above (a) the test.tex and solutiontest.tex files, (b) the Rnw files containing the exercise (pnorm1.Rnw, and sesamplesize1multiplechoice.Rnw), and (c) the exam generating code (ExampleExamCode.R).  Put all of these into your working directory, say ExampleExam. Then run the R code, and be amazed.

If something is broken in my example, please let me know.
Shuffling questions: If you want to reorder the questions in each run of the R code, just change myexamlist to sample(myexamlist) in the call below that appears in the file ExampleExamCode.R:

sol <- exams(sample(myexamlist), n = num.versions, 
             dir = odir, template = c("test", "solutiontest"),
             header = list(ID = getID, Date = Sys.Date()))

Wednesday, January 06, 2016

My MSc thesis: A meta-analysis of relative clause processing in Mandarin Chinese using bias modelling

Here is my MSc thesis, which was submitted to the University of Sheffield in September 2015. 

The pdf is here.

Title: A Meta-analysis of Relative Clause Processing in Mandarin Chinese using Bias Modelling

The reading difficulty associated with Chinese relative clauses presents an important empirical problem for psycholinguistic research on sentence comprehension processes. Some studies show that object relatives are easier to process than subject relatives, while others show the opposite pattern. If Chinese has an object relative advantage, this has important implications for theories of reading comprehension.  In order to clarify the facts about Chinese, we carried out a Bayesian random-effects meta-analysis using 15 published studies; this analysis showed that the posterior probability of a subject relative advantage is approximately $0.77$ (mean $16$, 95% credible intervals $-29$ and $61$ ms). Because the studies had significant biases, it is possible that they may have confounded the results. Bias modelling is a potentially important tool in such situations because it uses expert opinion to incorporate the biases in the model. As a proof of concept, we first identified biases in five of the fifteen studies, and elicited priors on these using the SHELF framework. Then we fitted a random-effects meta-analysis, including priors on biases. This analysis showed a stronger posterior probability ($0.96$) of a subject relative advantage compared to the standard random-effects meta-analysis (mean $33$, credible intervals $-4$ and $71$).