Until starting the Vacation Research Scholarship, I had only used MATLAB and R when writing code. At the recommendation of my supervisor, Dr Lewis Mitchell, I used my scholarship as an opportunity to learn the basics of Python. Python’s range of powerful data analysis packages made it ideal for my project, which was centred around the analysis of large data sets.
Again, at the recommendation of my supervisor, I installed the Python distribution ‘Anaconda’ and predominantly used ‘Jupyter Notebooks’ – this was much more beginner friendly than accessing Python from the command line. I found that the syntax was reasonably similar to that of MATLAB, which meant that I was able to pick it up quite quickly.
As I mentioned earlier, Python has a huge range of packages. Many of these are designed to help in data analysis. For example, as part of my project I needed to fit power-law distributions to data. Writing my own code to perform this task would have been quite tedious, but thankfully I was able to use the aptly named ‘powerlaw’ package instead. This seems to be quite typical of programming in Python – whenever you hit an obstacle, check to see if there is already a package designed to overcome it. Most of the time, someone else will have already faced the same problem, and may have written a package designed to help you solve it.
Another part of my project required me to collect my own time-resolved data about Reddit content. While this data could be collected by using web-scraping code, I was able to use the Python package ‘praw’ instead, which allowed me to access Reddit’s data in real time via the Reddit API. Web-scraping code would have been able to collect this data, but because the Reddit API had already been integrated into ‘praw’, using the package was a much neater solution. However, I still chose to experiment with some web-scraping code, as it will be very useful if I ever want to collect time-resolved data from a different website in the future.
Hopefully, these examples will raise your awareness of how Python and its many packages provide very powerful tools for data analysis. I am very pleased to have had the opportunity to learn some basic Python code over the course of my VRS, and look forward to being able to use it in the future.
John Davey was a recipient of a 2018/19 AMSI Vacation Research Scholarship.