Machine Learning Pipelines for R

Posted on Mon 08 May 2017 in r • Tagged with machine-learning, data-processing


Building machine learning and statistical models often requires pre- and post-transformation of the input and/or response variables, prior to training (or fitting) the models. For example, a model may require training on the logarithm of the response and input variables. As a consequence, fitting and then generating predictions from …

Continue reading

elasticsearchr - a Lightweight Elasticsearch Client for R

Posted on Mon 28 November 2016 in r • Tagged with data-processing, data-stores


Elasticsearch is a distributed NoSQL document store search-engine and column-oriented database, whose fast (near real-time) reads and powerful aggregation engine make it an excellent choice as an ‘analytics database’ for R&D, production-use or both. Installation is simple, it ships with default settings that allow it to work effectively out-of-the-box …

Continue reading

Asynchronous and Distributed Programming in R with the Future Package

Posted on Wed 02 November 2016 in r • Tagged with data-processing, high-performance-computing


Every now and again someone comes along and writes an R package that I consider to be a ‘game changer’ for the language and it’s application to Data Science. For example, I consider dplyr one such package as it has made data munging/manipulation that more intuitive and more …

Continue reading

An R Function for Generating Authenticated URLs to Private Web Sites Hosted on AWS S3

Posted on Mon 19 September 2016 in r • Tagged with AWS


Quite often I want to share simple (static) web pages with other colleagues or clients. For example, I may have written a report using R Markdown and rendered it to HTML. AWS S3 can easily host such a simple web page (e.g. see here), but it cannot, however, offer …

Continue reading