Although I am interested in many things, this blog is focused on the disciplines of ‘data science’ and ‘machine learning engineering’. Given that these are nebulous and over-used catch-all phrases I’ll be more specific - this is a blog about everything that’s involved in turning raw data into ‘information’ that one could ‘do something’ with, via tangible ‘end products’. As I see it, this covers the methods and tools used for:
- data storage;
- data extraction and transformation (ETL);
- data exploration;
- data modeling; and,
- delivering results, decisions and actions.
I am particularly interested in Python, the ‘PyData stack’, Spark, Elasticsearch, Docker and Kubernetes - all from an OS X user’s frame of reference. These are my day-to-day tools along with pencil and (squared) paper.