Genetics is arguably the most exciting thing happening today. It is changing the world as rapidly and dramatically as the advent of the internet. edX’s “Introduction To Biology – The Secret of Life” left me wanting to know more. As I looked for follow-up courses that would go deeper, Bioinformatics stood out as the ideal way to build on my engineering background. “Bioinformatics Algorithms (Part 1)”, which I’ll call “Bioinformatics 1”, is the first of a two-part class offered by Coursera. This review covers the Fall 2014 session.
Bioinformatics introduces biology to computers. It is the study of how to use computers to analyze DNA and amino acid sequences. For example, the following two questions are not only answered, but implemented by each Bioinformatics 1 student:
1) How can you assemble a complete genome sequence from many fragments? Scientists have developed techniques to take a long strand of DNA, chop it into pieces, and find the exact sequence of DNA “letters” in each piece. But how can you take a vast collection of short sequences and reassemble them into a single contiguous sequence?
2) How do you find DNA sequences shared by two different species? This might sound easy, but consider that the DNA of a common ancestor will have millions of years’ worth of insertions, deletions, and mutations in both species. Still sound easy?
You do need an introductory DNA class.
I differ with the official recommendations regarding prerequisites. In my opinion you do need an introductory DNA class. It is taken for granted that you already know that DNA consists of four letters, what a reverse complement is, what RNA is, how the 4 letters encode 20 amino acids, etc. Bioinformatics 1’s coverage of these topics is only adequate as a refresher.
You do need more programming experience than merely completing an introductory class. This is not absolute; you might be the exception. But I recently took the recommended introductory Python class and there is no way it was sufficient preparation. This class is for strong swimmers. For those still learning to swim, I recommend first taking “Algorithms, Biology, and Programming for Beginners“, taught by the same instructors.
This class is for strong swimmers. For those still learning to swim, I recommend first taking “Algorithms, Biology, and Programming for Beginners”, taught by the same instructors.
You do not need Python. Use your favorite language. Python probably was the most-used language, but I chose C++ because of its reputation for speed and its common usage by electrical engineers like me. When debugging problems like accessing non-existent elements in an array, I knew that other languages would have saved me the trouble. However, my 14-year-old Compaq Evo W8000 workstation was fast enough to complete all class assignments using C++. I don’t know if that would be true for all languages.
You do not need to purchase the printed textbook. The free online “interactive text” has the same material, but the “interactive” part has no print equivalent. For example, there are frequent online “Exercise Breaks” that usually ask a question with a numerical answer. You enter your best guess, and the website will tell you if it was correct.
The three instructors, Pavel Pevzner, Phillip Compeau, and Nikolay Vyahhi, are literally and figuratively world-class. Though based at the University of California San Diego, they must rack up lots of frequent flier miles as they concurrently work on bioinformatics projects in Russia.
Despite what Dr. Pevzner self-described as a “heavy Russian accent”, not only was the team’s English perfectly understandable, their international background made the class more enjoyable. For example, the famous mathematical challenge “The Seven Bridges of Konigsburg” was introduced with a video featuring Dr. Pevzner at the historical Konigsburg site in Russia.
The material was presented in two ways, through video lectures and an “interactive text”. The videos varied in length from three to ten minutes. They covered the same material as the text, but without some of the gritty details. What worked best for me was to use each video as an introduction to the corresponding text.
Each chapter of the “interactive text” was subdivided into a dozen or so manageable pieces, plus “Charging Stations” and “Detours” with additional information. Making technical material readable and interesting isn’t easy, but Compeau and Pevzner largely succeeded. The introduction to each chapter included a humorous, yet baffling, illustration. For example, imagine a giant stack of pancakes sitting on the ground, with a huge crack running through them; in the distance sits an Egyptian Sphinx with Susumu Ohno‘s face. By the time we completed the chapter, its picture made sense, more or less. Another humorous aspect to the class was the consistent depiction of the instructors as Old West cowboys.
Making technical material readable and interesting isn’t easy, but Compeau and Pevzner largely succeeded.
GRADING & ASSIGNMENTS
We were graded on just two things: programming assignments (80%), and quizzes (20%). All programming assignments were weighted the same — 10 points — and they totaled 550 points. In each case, there would be (sometimes vague) instructions, a simple test data set with expected results, and a large data set, also with expected results. A single assignment could take as little as 15 minutes or as much as 10 hours. Each time, I would (1) write a program that read in the simple data set, then debug it until it spit it out the expected results, (2) attempt to process the large data set, and usually do more debugging to get the expected results, then (3) attempt to process the unique, graded data set and return the answer within the limit of 5 minutes. Proof of improvement between 2013 and 2014: not once did I have a problem with the automatic grader (although many times I falsely accused it mentally before finding my own bug). The 2013 forums mentioned some problems with the grader, which obviously got solved.
One theme of the class was the importance of using efficient algorithms. For almost every assignment, there was an obvious, and wrong, way to solve the problem.
One theme of the class was the importance of using efficient algorithms. For almost every assignment, there was an obvious, and wrong, way to solve the problem. The wrong way would give the right answer, but not fast enough. In each case, there was a better way to accomplish the same task in a reasonable amount of time. I once re-wrote a program three times to get it to run fast enough. The third version took 20 minutes to run; the fourth version, implementing advice from the forums, took less than a second!
Quizzes were short and sweet. Generally there were a few questions that could be quickly answered if you had read the material. There were also a handful of questions that required reusing a program already written for a programming exercise. Although re-taking of quizzes was allowed, I never did.
There is a satisfaction that only comes from working hard and achieving something difficult. At the end of this course, I was mentally exhausted, but I felt that satisfaction.
In an average week, I spent a half hour watching videos, an hour reading the text, and about 18 hours programming and debugging. I worked on the class every day. It took more time than the syllabus suggested, but I am just a self-taught C++ programmer. A computer science major with a couple of years’ experience could probably complete the assignments in half the time.
A background theme of the class was the superiority of a well-executed online course, compared to the “traditional classroom lecture plus homework” format. Bioinformatics 1 was the showcase; we were the beneficiaries. Dr. Pevzner believes students’ time with an instructor is best used answering questions and helping students who are stuck, and he makes a convincing argument.
There is a satisfaction that only comes from working hard and achieving something difficult. At the end of this course, I was mentally exhausted, but I felt that satisfaction. Am I taking Bioinformatics 2? You bet!