About this Blog

Although I am interested in many things, this blog is focused on the disciplines of ‘data science’ and ‘machine learning engineering’. Given that these are nebulous and over-used catch-all phrases I’ll be more specific - this is a blog about everything that’s involved in turning raw data into ‘information’ that one could ‘do something’ with, via tangible ‘end products’. As I see it, this covers the methods and tools used for:

  • data storage;
  • data extraction and transformation (ETL);
  • data exploration;
  • data modeling; and,
  • delivering results, decisions and actions.

I am particularly interested in Python, the ‘PyData stack’, Spark, Elasticsearch, Docker and Kubernetes - all from an OS X user’s frame of reference. These are my day-to-day tools along with pencil and (squared) paper.