I took the first offering of this, the second course in the Genomic Data Science specialization, and there are a number of issues that I hope the specialization team can work out. The video lectures are short (20-30 minutes total per module), and the introduction to working with Galaxy is reasonably interesting. The explanations given by Dr. Taylor are fairly good, but as with other courses in this specialization and in the Data Science specialization, the depth of the instruction is not quite enough to prepare students for the final project.
A fair amount of time is spent on demonstrating how to run the Galaxy software system through the cloud or locally on your own machine. Some of this is problematic for several reasons. First, Galaxy does not play well with Windows, and the only reliable way to install Galaxy on a Windows machine is to run an instance of Linux (e.g. Ubuntu) either as a second OS or as a virtual machine. The instructors also suggest Amazon AWS as a cloud provider for those wanting to run Galaxy on the cloud.
You do not need to do any of this! By all means watch the videos so you know how it's done (it's required to take the associated quiz anyway). One person posted in the forums that he was charged $60 after he left several instances running in his Amazon AWS account overnight, even though they weren't actually doing anything. Others later reported similar charges of $100-$300. The Galaxy website allows 250GB storage per registered user and has plenty of processor time to allow students to run the tools to complete the demonstrations and final project in a reasonably timely fashion, especially if you do it at night when most of the researchers using the platform have finished their work for the day.
The final project requires you to determine the number and type of variants from sequence data from a father/mother/daughter trio. The tools are all available in Galaxy main, but there is not enough background information given to make the process intuitive, nor were certain essential questions answered (ex: should I analyze each sample individually or pool all the subjects into the same sample before analysis?). One issue was that, this being the first instance of this course, there were no community TAs available to answer questions, so students had to rely on the instructor for guidance, and he was of course not often involved on the discussion forums.
Some internet resources are available to help with this (see this guide on variant calling and this Nature Genetics article). As a note, though, I could not get the listed workflow from the first resource to work for the data we were given and ended up trying to work through it by essentially picking tools based on their names/descriptions. I was able to get an answer, but there was no way to tell prior to submission if my answer was correct, close, or completely wrong. Unfortunately, it ended up that there was no way to tell when evaluating other students' projects whether their answers were right or wrong either, despite the fact that the rubric asks you to do just that. Additionally, the rubric asks evaluators to assess whether a particular variant was present is the .vcf file, and this variant was not called using the hg19 reference genome that was used for the Galaxy demonstrations (and confirmed by the instructor in one of his rare forum posts to be appropriate for the project). To his credit, the instructor resolved this (after someone sent him a direct email).
Overall, two stars. There is some value here, but the expectations aren't particularly clear, and the course project (if done correctly) is well beyond what is taught in the lectures. This would be fine (and indeed it's characteristic of the Data Science Specialization and this specialization), but the resources online that might help with actually completing the project are confusing, contradictory, or deprecated.