About this Blog

Although I am interested in many things, this blog is focused on the disciplines of data science and machine learning engineering and operations. Given that these are nebulous and over-used catch-all phrases I’ll be more specific - this is a blog about everything that’s involved in turning raw data into information that one could do something with, via tangible end products that deliver value to someone or something. As I see it, this covers the methods and tools used for:

  • data ingestion and storage;
  • data extraction and transformation (ETL);
  • data exploration;
  • data modeling;
  • engineering machine learning systems; and,
  • automating the production of results, decisions and actions of any kind.

I am particularly interested in Python, the ‘PyData stack’ (NumPy, SciPy, Scikit-Learn, PYMC3, etc.), Apache Spark, Elasticsearch, Docker and Kubernetes - all from an OS X user’s frame of reference. These are my day-to-day tools along with pencil and (squared) paper.