Analyzing GitHub, How Developers Change Programming Languages over Time

Have you ever been struggling with an nth obscure project, thinking : “I could do the job with this language but why not switch to another one which would be more enjoyable to work with” ? In his awesome blog post: The eigenvector of “Why we moved from language X to language Y “, Erik Bernhardsson generated an N*N contingency table of all Google queries related to changing languages. However, when I read it, I couldn’t help wondering what the proportion of people who effectively switched is. Thus, it has become engaging to deepen this idea and see how the popularity of languages changes among GitHub users.

Dataset available

Thanks to our data retrieval pipeline, source{d} opened the dataset that contains the yearly numbers of bytes coded by each GitHub user in each programming language. In a few figures, it is:

  • 4.5 Million GitHub users
  • 393 different languages
  • 10 TB of source code in total

Read more at source{d}