Software engineering for data scientists – Part 1, Development Tools

I have bad news; being able to write programs is only a tiny fraction of software development and data science. You have to know a lot of things and this can become very frustrating. You will be bombarded with acronyms and silly-named tools, like IDE, git, version control, CI, etc, etc. At some point, you have to start use the tools of the trade along with coding and here we give you some advice on what to use to become a pro.

A beginner programmer has got a great power, which comes great responsibility. Writing a script to collect some data from the web, cleaning up the data, building a fancy ML model and making predictions is the ultimate goal of the aspiring data scientist. Although this is a complex process, and if you can achieve this please be proud of yourself, but keep in mind, that things are much more complicated in real life! In the real life, you’ll have to work with other, so you have responsibility for others. Your project can generate valuable information even after a single run. But ideally, you want to build something that runs continuously, something that can be repeated by several times over time. In a nutshell, you want to build software. It doesn’t matter, if you are working for profit, or at academia or just for fun. A serious data scientist or developer is able to work in a team, and she/he can build usable and stable software as part of a team. You don’t have to know everything from day one, but you have to accept the fact that you should invest lots of time into learning the basics of software engineering and keeping yourself up-to-date during your career. This series tries to help you to get started with three main areas of software engineering:

  • Development tools (e.g. operating system, IDEs, version control, etc.)
  • Software development best practices (e.g. testing, packaging, API design and development, continuous integration/deployment)
  • Software engineering methodologies (e.g. Agile, CRISP-DM, ML guidelines etc.)

Learn Linux and version control

Probably this open MIT course is the best source to learn about using *nix type operating systems. Don’t worry, you can use the bash and all the tools this course mentions on Windows and Mac too (by the way, the Mac OS is a sort of *nix OS). Yes, you should know a little bit about operating systems, and Linux is still the best choice for development. This wonderful course will teach you to effectively use the bash shell tools to command your machine, git for version control and how to profile and debug your Python code. Even it introduce you into Vim, a programmers’ editor.

Use PyCharm or an other mature IDE

We love programmers’ editors like vim and emacs, but it takes ages to configure them for serious programming. Integrated Development Environments, or IDEs for short, come with everything you need for development, code completion, interactive repl, a terminal, lots of plugins for various libraries, et.c etc. We are biased towards PyCharm, but Microsoft’s Visual Studio is also a good one. Integrated Development Environments are built for developers. They are like text editors on steroids! We only pay for one software, the PyCharm Professional Edition. If you are a student, or you don’t want to pay for an IDE, try its community edition.

Use virtual environments

If you are using Python for more than one year, you should be aware of the fact that there is a new Python release every year. Also, there are various Python implementations, e.g. pypy, IPython, etc. Packages also have versions, and they import other packages and so on. Each package has got its own dependencies, so things can get messy if you install each and every package for the main Python interpreter of your machine. Virtual environments make possible to use a separate Python environment for each project of yours, so you can use Python 3.7 for one project, and Python 3.9 for an other one. The Hitchhiker’s Guide to Python is an awesome resource to learn more about advanced Python development practices, we esp. love the sections on setting up and using virtual environments.

Use Black or any other code formatter

Consistent and idiomatic code style makes your code legible for other human beings (including you). We are not good at this thing, so we started using Black, the uncompromising code formatter. If you are a purist, or just you don’t like Black’s decisions, you can find other linters. Find the one you and your fellows like, and use it!

Document your code

Document your code using docstrings. Follow a docstring style guide, we recommend the Google Style guide. Using tools like Sphinx, you can generate and HTML documentation and host it on Read the Docs.


You should know the basics of your Operating System, since it does a great deal of the job even if you are using Python . A good IDE helps you to write clean and readable code and it makes debugging much easier. Documenting your work is good for your teammates and for your future self too. Using a version control system helps you to separate the working version of your project from the development version and also, it makes team work much easier.


The header image was downloaded from Pixabay.


Make a one-time donation

Make a monthly donation

Make a yearly donation

Choose an amount


Or enter a custom amount

Your contribution is appreciated.

Your contribution is appreciated.

Your contribution is appreciated.

DonateDonate monthlyDonate yearly

Do you like our visualizations? Buy them for yourself!

Visit our shop on Society6 to get a printout of our vizs.