I just wrapped up an 11-day visit to Khartoum, Sudan, where I co-taught, with my colleague Ahmed Siddig, a six-day intensive short course intended to cover designing reliable biodiversity monitoring studies, basic analysis of the resulting data, and the fundamentals of the R programming language. With approximately six hours available each of the six days (nearly equivalent to the amount of lecture time in an average semester-long course), our initial syllabus included: introductory lectures on data needs about biodiversity in sub-Saharan Africa; using biodiversity data to inform environmental policies; sampling design; a morning field exercise to collect biodiversity data on the Shambat campus of the University of Khartoum, in and around the grounds of the Faculty (i.e., Department) of Forestry; a discussion of best practices for data management; basic R (interface, syntax, data entry, manipulations, and exploratory data analysis); and using R for standard ecological statistics applied to biodiversity data (rarefaction and extrapolation, occupancy and detection probability, ordination and classification, and hypothesis testing using regression and analysis of variance). There were also a few skills-building lectures tucked in after hours: success in research and teaching; science communication; and open discussion of how to apply for overseas opportunities.
This seemed like a reasonable schedule and a reasonable goal for a workshop attended by an expected 20 or so M.Sc. and Ph.D. students, junior and senior faculty, and established forestry and wildlife researchers at government agencies. Files would be shared using Dropbox and we would be building scientific capacity among the current generation of young Sudanese researchers and conservationists. Success was assured.
And then our expectations met reality.
First, and well before I had left the US, demand for the course far exceeded our expectations. Within days of posting the course announcement at universities around Sudan, we had 20 people signed up and a wait-list far longer (despite an enrollment fee of 1500 Sudanese pounds, equivalent to more than a week’s salary for the average Assistant Professor, or about US $75 on the black market, the pound not being a freely exchanged currency). Not knowing if we would ever have the opportunity to do such a course again, I convinced Ahmed that we should let in every applicant. We ended up with 43 energetic, engaged, and thoughtful participants from throughout the country, the last of whom signed up the day before the class started. This was far more than our small lecture room could comfortably hold, but the acoustics were terrible in the larger lecture room and the internet didn’t reach there either, so we rearranged the desks in the smaller room, hung a router on the wall to try to boost the signal, increased the font size in R Console to 36-point so it would be viewable on a 1.5-m2 screen, and hoped for the best.
We had asked the participants to install Dropbox, R, and R Studio before they came, so we could start in right away. But many of them don’t have internet at home, and even among those that did, navigating the various interfaces to download the software proved, in many cases, to be overwhelming (and takes far too many clicks and screen reloads). So after a round-the-room set of introductions that went more than twice as long as expected (actual vs. expected participants), a review of class logistics, and two introductory lectures, we asked the students to download and install the three aforementioned software packages.
We had a wireless router in the classroom, that drew off what I can only imagine was a single DSL line that also served the three floors of the Forest Products wing of the Faculty of Forestry’s building. All of a sudden, 43 people each were trying to download a few hundred Megabytes of installers from a world-wide-web that, in unpredictable places, blocks Sudanese IP addresses (Sudan itself doesn’t block anything, but because of the two-decade economic embargo, provisionally lifted by the US on 13 January 2017, pieces of the rest of the world block Sudan). You can guess what happened. Absolutely nothing. In what charitably could be called a reversed denial-of-service, the copper overheated, the fiber was clogged, downloads ground to a halt and failed, people kept clicking their mice in frustration (leading to further, redundant failed downloads), and minor chaos ensued. The only upside was that there was no one else in the building using the Internet at the same time.
Why is that, you might ask… I learned—only at the end of the week—that the entire Shambat campus, which includes in addition to Forestry, the Faculties of Agriculture, Environment and Geography, Animal Production, and Veterinary Medicine, had been shuttered for a year—faculty furloughed with pay, students sent home—on 25 January following the arrest of five students from Darfur (anyone out there remember Darfur? The international humanitarian spotlight has moved on but it’s still there and people are still suffering badly under government oppression) during a sit-in to protest the year-long suspension of five other students on campus. This action came less than a year after the main campus had been shut down indefinitely. Other than our classroom and a few offices around it, the Shambat campus and all its buildings and offices were padlocked and guarded.
Making the best of an unexpected situation, we gamely suggested that folks keep trying, be patient, download overnight at home, or come in early the next morning to try to stagger the Internet traffic. And since the next morning was going to be mostly field work (click on any of the photos below—all taken by Maysoon Osman—to see larger versions of Sunday’s biodiversity data collection in a slide show)
followed by an afternoon lecture on student and researcher skills for success, we thought we had a day of breathing room by which time everyone would have been able to download, install, and test the needed software.
Monday in the desert in the dry season dawned bright and sandy. Our students trickled in between 0900 and 0930—a not unforeseen occurrence given Khartoum’s traffic, the vicissitudes of what passes for public transportation there, and the simple exigencies of life in a country where the average wage-earner brings in <$2000/year. Fewer than half had successfully installed Dropbox, let alone R or R Studio, and many of those that had were besieged by dysfunctional operating systems (I saw not a single “authorized” copy of Windows) or had web-browsers that were out of date, misconfigured, or simply broken. In many cases, the amazing assortment of laptops and their hopeful stewards were unable to accept an invitation to share a file either because inboxes were hopelessly full, downloads hadn’t succeeded, browsers weren’t “hand-shaking” with Dropbox, or a multitude of other reasons that have me considering a future career at a Sudanese help-desk. Fundamentally, though, the concept of file sharing beyond email attachments or WhatsApp, let alone the ability to navigate the arcane home screen of Dropbox (a pox on the UI designer who thought it would be an obvious action to click on the rainbow “Sharing” icon to see shared files instead of clicking on the more obvious “Files” icon!) was as foreign to most of our students as was the idea that a Harvard researcher would devote a week of his time to teaching R in Khartoum.
And please, don’t get me started on the home screens of either R or R Studio.
Here are some lessons I learned about R this week (keeping in mind that I’ve been using R and its progenitor, S, since the mid-1980s, written hundreds of thousands of lines of R code, co-written an R library, and am a co-PI on a software engineering project developing tools for capturing provenance of data and analysis in R that by necessity gets deep under the hoods of R and R Studio; and also keeping in mind that the participants all have, or are working toward, advanced degrees in ecology, forestry, natural resource management, conservation, etc., all had had at least one basic statistics course, and all were keenly interested in learning and using R):
- Standard font sizes of R and R Studio projected with a VGA projector onto a 1.5-m2 screen are unreadable a meter away from the screen. In
a standard classroom, it’s useless. You can enlarge the fonts in R Console to 36 point (at the cost of having your lines be < 25 characters long before they run off the screen), but as far as I can tell, there’s no way to enlarge, the tabs, menu gars, interface,
etc. in R Studio to make it readable from a distance anywhere (please use the comment option at the bottom of this post to teach me otherwise). While I kept hoping that students would have R Studio on their laptops and could follow along (but see above), I ended up having to work almost exclusively in R Console for demonstration while waving my hands around R Studio’s interface (“…now this graph would show up in the lower right window…”).
- If you don’t know what a text or a csv file is, read.csv won’t make any sense.
- Even if you can figure out how to make read.csv() work, cracked, bit-Torrented versions of Excel don’t always put commas between columns. Sometimes you get semi-colons (use read.csv2 or specify sep=”;” and dec=”,” in read.csv(…)), spaces, or tabs (use read.table()).
- The concept of an object is difficult to grasp. Why should one have to assign the output of a function to an object in order to use it? I can’t even tell you how many times I saw students correctly use read.csv() but not assign it to an object. Naturally, the result echoes to screen, but it isn’t subsequently available. And don’t even try to abstract to pipes (I didn’t); tidyr is only useful if you have something to tidy up.
- The concept of a directory (and hence getwd() and setwd()) is equally challenging. Everyone (save Shah Khalid from Pakistan) was using Windows because Apple products are unavailable in Sudan, but almost everyone had all their files on their desktop. Using file explorer to find a file was pretty fancy gymnastics, whereas setwd() was equivalent to transmutation and made about as much sense as turning water into wine (which, of course, is unavailable anywhere in Sudan). Dozens of students would enter the file name (with or without quotes) into getwd(), with predictable results.
- Which is to say, R error statements are incomprehensible, even at the best of times.
- And if you need to download ggplot2, along its 17 other dependent libraries (!) when 42 other students are trying to download the same 30 Mb or so at the same time on the same overheated DSL line, and then the power goes out for four hours…well, it’s time for another cup of tea, a deep breath, and get started again.
We took another break on Tuesday for a larger day-long workshop on establishing an LTER in Sudan (post to come). Another opportunity for the students to catch up on installation and basics, although most attended the LTER workshop (in the big room, without internet. And did I mention, almost no one here can afford 4G, and anyway, coverage is not universal even in the capital, and the same is true for 3G, so hot-spotting a WiFi connection—with the ungodly data charges? Fuggetaboutit).
Wednesday. Two-and-a-half days to go. We abandoned the schedule and went back to basics.
We re-installed Dropbox; accepted invitations to share folders; skipped the download to Dropbox desktop and just worked through the web interface; and then got rid of Dropbox desktop all together; since its bad default habit of syncing on boot simply crashed the router and I wasn’t about to illustrate desktop Dropbox settings to a group of students for whom “settings” was nearly as strange as a csv file.
Made sure, finally, that everyone (well, almost everyone, anyway) had R and R Studio installed; that the majority knew that csv files came in three flavors and could tell the difference; that point-and-click from the unreadable list of files obtained from the Files tab in the bottom right R Studio window worked pretty well; much better than interpreting the output of head()); and that nothing would work unless the output was assigned to an object and the object was visible in the unreadable Global Environment window at the top right of R Studio.
And that everyone could read in the data they had collected on Sunday, entered—in long format (well, mostly)—into an Excel spreadsheet, exported as a csv (separated by commas, semicolons, or tabs) and shared back to me via Dropbox.
Not bad for a six-hour day, but it was day 5, not day 1.
Thursday, I’m feeling the crunch. I’d like them to see something done with their data. Fortunately, it’s been shared (viva Dropbox!), and I spend Thursday morning on a quick (2½-hour) overview with examples using their data as examples for three key topics: R syntax as illustrated by reading in data, computing means, and subsetting vectors; creating a simple graph (but please, don’t try to install ggplot2 now!); and scripting. In the afternoon, they tried it on their own data. Some succeeded! Others asked me how to install Dropbox. Which was also a metric of success: we were down to a minority of the participants without it, and by then all their data and the R scripts had been made available there for everyone to use (click that rainbow icon!).
Friday, we sacrificed a goat and had a magnificent picnic (I’ll save that for another post, too).
Saturday morning we wrapped up with a lecture on science communication, a demonstration of the pwr() package to estimate appropriate sample size, the importance of pilot studies and the literature (estimating effect sizes), and the best practice of involving a statistician in the project from the beginning. All those other topics on the schedule? Next time!
All of the above minor carping aside, the week was full of accomplishments.
- We got Dropbox, R, and R Studio installed and opened on 43 laptops;
- Every participant who started the course also finished it, and not a one missed a day;
- Students and faculty learned they can help each other learn and work with software that wasn’t designed with their background, needs, expectations, or life-challenges in mind;
- A community of scientific collaborators was established and given usable tools for file sharing and doing open, transparent, repeatable research.
And it was a tremendous privilege for me to have had the opportunity to work with and learn from such a great group of students and colleagues.