Some statistical jobs are either too memory-greedy or computationally intensive to run on a local machine. At the Johns Hopkins Medical Institutes (JHMI), researchers have access to a Linux cluster running a Oracle Grid Enginge (previously called the Sun Grid Engine).
Jobs on the Joint HPC Exchange (JHPCE) can be run interactively with the
qrsh command or through a
qsub bash submission. JHPCE also has Stata-MP installed so that’s another reason why I use it for larger jobs.
I usually submit a
qsub job by writing
qsub Scripts/NAME_OF_SCRIPT into terminal. All of my scripts are kept in the Scripts directory and usually named with the convention
run017_05.sh for a Stata bash file to run do file 05 in the 017 project (I explain my project organization in another post). The command
qsub Scripts/run017_05.sh will read the follow script.
#!/bin/bash -l #$ -pe local 2 #$ -l mem_free=3G #$ -l h_vmem=4G #$ -m e #$ -M email@example.com #$ -e efiles #$ -o ofiles cd ~/ERGOT/000_workspace/017_multi_donor stata-mp -b do 05_exp_multi exit
-pe local 2 tells the cluster that I’ll need two computing cores. For JHPCE users - our current license only allows for 2-cores so there’s no benefit to requesting more.
-l mem_free=3G and
h_vmem=4G tells the cluster that I need 3G of memory allocated to this job. If Stata starts requesting more that 4G, then the job will be aborted. Stata-MP tends to benefit more from computing cores than RAM, so I generally keep these requests low (I often don’t go over 1G of actual use). R, on the otherhand, is memory-greedy so I sometimes request 10-20G of memory for complex jobs.
The next two lines,
-m -e and
-M firstname.lastname@example.org tell the cluster to send me an email when the job is done. I wouldn’t use this line if you’re submitting tons of jobs at once, but it’s useful for those one-off jobs that will take hours to complete.
The next two lines,
-e efiles and
-o ofiles tells
qsub to put error files in the efiles directory (located at
~/efiles) and output files in the ofiles directory (located at
Finally, we get to the actual Stata part. First I change to the working directory with the
cd command. Then I run bash stata using
stata-mp -b - note that you could also write
stata-se -b or
stata -b if you don’t want to use Stata-MP. On the same line I submit the stata command
do and the name of the do file. After that runs we’ll exit Stata with
Here’s a gist.comments powered by Disqus