What Every Software Engineer Ought to Know About Data Science

This seminar was held by the UVic Matrix Institute. The speaker, Dr. Greg Wilson, has worked for 35 years in both industry and academia, and is the author or editor of several books on computing and two for children. He is best known as the co-founder of Software Carpentry, a non-profit organization that teaches basic computing skills to researchers, and is now part of the education team at RStudio.

Abstract: Engineering has been defined as “the use of the scientific method to design and build new things”, but software engineering courses rarely require students to conduct experiments or analyze data. This talk describes what such a course would look like, what its benefits would be, and how we can get there from here.

Data Science for Software Engineering

In data science, 90% of the work is cleaning the data.

What is software engineering? - Use scientific method to the design and construction of novel artifacts. --by Greg Wilson

Problem:

  • We don't teach undergraduates about programs/programming/programmer
  • Existing software engineering course is not practical.

Example:

  • The Empirical Software Engineering Using in R project: the author redid all the test in published paper by reaching out to all those authors and get the dataset. After redoing it, find many of them wrong (whether the application is wrong or something).

I've got a hammer, so that must be a nail is what many data scientist do and is wrong.

Mentioned book

Cheating Lessons: Learning from Academic Dishonesty