Search

Monday, October 31, 2011

Graduate certificate in statistics

I finally have a sabbatical; such a relief not to have to teach for a bit. What can one usefully do during a sabbatical? Here's a preliminary list:

(a) bird watching
(b) lounge around
(c) check email relentlessly
(d) surf the web for six months straight
(e) do a statistics degree

I decided to go for (e). I never really learned statistics systematically (some would say, not at all, and they would not be far off the mark ;), which is absurd if you consider that it's practically my bread and butter (using statistics for data analysis). Why isn't everyone in psycholinguistics a professional statistician? God knows we all need that expertise.

So I surfed the web a bit and found something that kills two birds with one stone. A part time distance MSc in statistics, at Sheffield. Time commitment is 20 hours a week (i.e., 3 hours a day if you work on it seven days a week); very minimal I would say. If I cut out on reading fiction and other non-work books in the evenings after dinner, that's three hours right there every day.

At Ohio State, one of my professors, Brian Joseph, related the story of a physicist at OSU who decided to learn Sanskrit by taking one of Brian's classes; he did his studies in the evenings after dinner. Before long he was teaching the course with Brian. So my only question is: why the hell didn't I think of formally learning statistics earlier?

I'm doing the prep course first, to review the math, stats, and probability theory that is assumed for the MSc, and plan to start the MSc next year (I do have to pass the prep course, would be pretty annoying if I failed ;). And the course is great! I never realized back in school that math is FUN, probability theory is FUN, I only understood that once I got to Ohio State. And I'm again enjoying struggling with unfamiliar problems.  The only annoying thing is that the homework assignments have to, of course, be submitted online, and so I feel compelled to typeset them nicely using Sweave+LaTeX, which is very time consuming. At Ohio State I never did that, I just wrote them up by hand (e.g., in formal language theory or discrete math or logic classes). Somehow the idea of an online submission demands typesetting, I can't bring myself to write it with a pn, scan the solution, and send it...

The course itself is good so far (too early to tell; e.g., in the stats segment we are drawing boxplots "by hand", which I did for the first time in my life today). But the textbooks could have been better, IMHO.

Thursday, September 01, 2011

Lower p-values apparently give you more confidence in the alternative hypothesis


"But an isolated finding, especially when embodied in a 2 X 2 design, at the .05 level or even the .01 level was frequently judged not sufficiently impressive to warrant archival publication." (p. 554)
From: Melton, A. W. (1962). Editorial. Journal of Experimental Psychology, 64, 553–557.

According to Gigerenzer et al (Published in: D. Kaplan (Ed.). (2004). The Sage handbook of quantitative methodology for the social sciences (pp. 391–408)), this quote is where the common convention comes from to use p-values as a measure of one's belief in a result.

Gigerenzer et al write: 

"Editors of major journals such as A. W. Melton (1962) made null hypothesis testing a necessary
condition for the acceptance of papers and made small p-values the hallmark of excellent
experimentation."

Sunday, July 17, 2011

Potsdam Mind Research Repository

Here's a revolutionary website: all data and code accompanying papers.

http://read.psych.uni-potsdam.de/pmr2/

Imagine if it was mandatory to release data with your publication! It would make life so much easier.

Thursday, December 09, 2010

Friday, October 15, 2010

I foolishly tried to convert a matrix M to a vector using vector:

vector(M)

But this is done column wise, so that adjacent row items end up non-adjacent.

The right way to do this seems to be:

library(data)
unmatrix(M, byrow=TRUE)

Wednesday, September 29, 2010

Mike's and my book is coming out


Our book
is finally coming out:




You can buy it on
Amazon.com, Amazon.de, or Springer.com.

tilde's in URLs (LaTeX)

Obscure LaTeX command:

\textasciitilde{} % for tilde's in URLs as text.

I'd been using $\sim$.

Saturday, September 25, 2010

Public and private cv's (LaTeX)

Not directly related to statistics but:

Often one wants to have a public cv that one can put on the web, and a more restricted one that has private information that one only needs for a job application or something. Instead of maintaining two cvs, there's an easy way to automate it if you are a latex user.

1. For a public cv, type:

## public
pdflatex vasishthcv.tex

2. For a restricted cv, type:
## restricted
pdflatex -jobname vasishthcv "\def\UseOption{opta}\input{vasishthcv}"

where in the tex file, you have in the preamble:

\ifx\UseOption\undefined
\def\UseOption{optb}
\fi
\usepackage{optional}

and

in the text itself for restricted sections use:

\opt{opta}{Home address:...}

Friday, December 11, 2009

Statistics in linguistics

People in linguistics tend to treat statistical theory as something that can be outsourced--we don't really need to know anything about the details, we just need to know which button to click.

People easily outsource statistical knowledge in an empirical paper, but the same people would be appalled if they hired an assistant to work out the technical details of syntactic theory for a syntax paper.

The statistics *is* the science, it's not some extra appendage that can be outsourced.

Thursday, April 23, 2009

How to get ESS style indentation in textmate

This should be standard in Textmate, I don't know why one has to go through so many steps to get it working:

http://gragusa.wordpress.com/2007/11/11/textmate-emacs-like-indentation-for-r-files/

How to update R bundle in textmate

Got this from the web somewhere:

Just create a script with the following content:


#!/bin/sh

LC_CTYPE=en_US.UTF-8
SVN=`which svn`

echo Changing to Bundles directory...
mkdir -p /Library/Application\ Support/TextMate/Bundles
cd /Library/Application\ Support/TextMate/Bundles

if [ -d /Library/Application\ Support/TextMate/Bundles/R.tmbundle ]; then
echo R bundle already exists - updating...
$SVN up "R.tmbundle"
else
echo Checking out R bundle...
$SVN --username anon --password anon co http://macromates.com/svn/Bundles/trunk/Bundles/R.tmbundle/
fi

echo Reloading bundles in TextMate...
osascript -e 'tell app "TextMate" to reload bundles'

Wednesday, July 04, 2007

Selection bias in journal articles

Journals dealing in psycholinguistic research do not publish null results generally, because they are "inconclusive". So it's completely possible that out of 100 experiments, 95 are inconclusive, and 5 are "significant", but that all five are Type I errors. But it's those 5 experiments that will get published.

The naive rebuttal to this would be that such a situation would only rarely arise. But the non-obvious thing is that rare events do happen. If we published only those five articles, then how would we draw the conclusion that we are not in Type I la la land?

Saturday, April 28, 2007

Rlang mailing list

Roger Levy has created a possibly useful wiki for exchanging questions about the use of R for language research:

https://ling.ucsd.edu/mailman/listinfo.cgi/r-lang

Tuesday, April 17, 2007

How to extract SEs from lmer fixed effects estimates

Extracting fixed effects coefficients from lmer is easy:

fixef(lmer.fit)

But extracting SEs of those coefficients is, well, trivial, but you have to know what to do. It's not obvious:

Vcov <- vcov(lmer.fit, useScale = FALSE)
se <- sqrt(diag(Vcov))

Saturday, February 17, 2007

Hmisc: how to increase magnification

One non-obvious thing (at least to me) about Hmisc's xYplot function is that to increase magnification or other parameters of a graph component, you have to do the following.

xlab=list("Condition",cex=2)

I.e., you have to make a list out of the parameter, and add whatever information you need. This works generally for any of the xYplot parameters.

Thursday, January 25, 2007

using winbugs with gelman and hill book on intel macs

I finally installed Windows on my Mac (a traumatic experience) and finally got the code working. However, the startup instructions on the website of the book did not work for me. I offer a working example for other souls as clueless as myself. The first problem is that the libraries have to be installed manually, they do not install automatically as adverstised. Second, the library R2WinBUGS has to be called explicitly to run the critical bugs command.
Also, if anyone out there is thinking of installing a dual boot environment in Mac in order to install WinBUGS, there is a bug (no pun intended) in the licence installation of WinBUGS. The decode command for the license does not work as advertised, but the license installs anyway.
The working version is here: http://www.ling.uni-potsdam.de/~vasishth/temp/schools2.R

Monday, January 22, 2007

Some expensive lessons I recently learnt about R/Sweave

1. If you are going to generate lots of latex tables automatically from an Rnw file, LABEL THEM.

2. weaver does not work with xYplot. If you are using the Hmisc library, just don't use weaver. I will present a solution here sometime soon.

The solution: set caching to off (cache=off) in the chunk that loads the Hmisc library and runs the xYplot command(s). You can turn caching on before and after the chunk, but xYplots need to be computed without caching.

3. xtable is unable to identify the fact that an R output line containing, e.g., log(sigma^2), has to be in math-environment in the tex. In Sweave this has the disastrous consequence that the .tex file does not compile. My kludgy solution is to search and replace the .tex file after Sweaving it.

It's frustrating that such good tools can sometimes be such a pain in the ass. I guess one should be grateful they are there at all.