View on GitHub

Meandering Data

The random wanderings of a physicist in the world of data science

Data be Nimble, Data be Quick

Posted May 19, 2014

At times, research in academia seems to operate at a snail’s pace. Results need to be understood in a deep and insightful way. (Need another month or two before you present your results? Sure!) Rushing results can lead to publication retraction and scientific embarrassment. Combining this with teaching positions, a multitude of unrelated projects, and mentoring undergraduates, academics tend to develop a “things get done when they get done” mentality. And it works. Sort of.

As a physicist, I spent far more time performing data manipulation and writing software than I did doing real physics. Such is the case for many scientists that deal with lots of data. Part of the reason for this is that programming backgrounds for scientists are developed on the go, rarely having a foundation built in CS knowledge. Typically, software is written that is hacky, fragile, and horrible to share, or productivity suffers as individuals waste weeks learning what “real programmers” might do.

This won’t fly as a data scientist. You need to be nimble. Like a bird, or Spider-Man. You know what nimble is, and you can spot a nimble programmer when you see her, but how do you become more nimble yourself? Here are my suggestions:

  1. Separate logical tasks in to small blocks of effort with precise definitions.
  2. Know what you want to accomplish in the next 30 minutes before putting your hands on your keyboard.
  3. Know what matters and what doesn’t. (Perfectionists, I’m talk to you)
  4. At the end of the day, summarize what slowed you down and what sped you up as your worked. Shed the slow habits.
  5. Practice new technologies even if you don’t think you’ll use them, just to practice learning.
  6. Give yourself 15 minutes to learn a new programming language (try listing prime numbers in Go or Julia) just as an exercise. Focus on learning exactly what you need to know and move on.
  7. Keep a list of tasks you perform often. Can you improve this with code re-use or automation? Yes, you can. Do it.

There’s a lot of overlap between being nimble and being efficient. However, being nimble is more about learning and constantly reacting to your workflow without losing sight of your goal.