Bayesian Regression in PYMC3 using MCMC & Variational Inference

Posted on Wed 07 November 2018 in data-science • Tagged with machine-learning, probabilistic-programming, python, pymc3

jpeg

Conducting a Bayesian data analysis - e.g. estimating a Bayesian linear regression model - will usually require some form of Probabilistic Programming Language (PPL), unless analytical approaches (e.g. based on conjugate prior models), are appropriate for the task at hand. More often than not, PPLs implement Markov Chain Monte Carlo …


Continue reading

Machine Learning Pipelines for R

Posted on Mon 08 May 2017 in r • Tagged with machine-learning, data-processing

pipes

Building machine learning and statistical models often requires pre- and post-transformation of the input and/or response variables, prior to training (or fitting) the models. For example, a model may require training on the logarithm of the response and input variables. As a consequence, fitting and then generating predictions from …


Continue reading

elasticsearchr - a Lightweight Elasticsearch Client for R

Posted on Mon 28 November 2016 in r • Tagged with data-processing, data-stores

elasticsearchr

Elasticsearch is a distributed NoSQL document store search-engine and column-oriented database, whose fast (near real-time) reads and powerful aggregation engine make it an excellent choice as an ‘analytics database’ for R&D, production-use or both. Installation is simple, it ships with default settings that allow it to work effectively out-of-the-box …


Continue reading

Asynchronous and Distributed Programming in R with the Future Package

Posted on Wed 02 November 2016 in r • Tagged with data-processing, high-performance-computing

Future!

Every now and again someone comes along and writes an R package that I consider to be a ‘game changer’ for the language and it’s application to Data Science. For example, I consider dplyr one such package as it has made data munging/manipulation that more intuitive and more …


Continue reading

An R Function for Generating Authenticated URLs to Private Web Sites Hosted on AWS S3

Posted on Mon 19 September 2016 in r • Tagged with AWS

crypto

Quite often I want to share simple (static) web pages with other colleagues or clients. For example, I may have written a report using R Markdown and rendered it to HTML. AWS S3 can easily host such a simple web page (e.g. see here), but it cannot, however, offer …


Continue reading

Building a Data Science Platform for R&D, Part 4 - Apache Zeppelin & Scala Notebooks

Posted on Mon 29 August 2016 in data-science • Tagged with AWS, data-processing

zeppelin

Parts one, two and three of this series of posts have taken us from creating an account on AWS to loading and interacting with data in Spark via R and R Studio. My vision of a Data Science platform for R&D is nearly complete - the only outstanding component is …


Continue reading

Building a Data Science Platform for R&D, Part 3 - R, R Studio Server, SparkR & Sparklyr

Posted on Mon 22 August 2016 in data-science • Tagged with AWS, data-processing, apache-spark

Alt

Part 1 and Part 2 of this series dealt with setting up AWS, loading data into S3, deploying a Spark cluster and using it to access our data. In this part we will deploy R and R Studio Server to our Spark cluster’s master node and use it to …


Continue reading

Building a Data Science Platform for R&D, Part 2 - Deploying Spark on AWS using Flintrock

Posted on Thu 18 August 2016 in data-science • Tagged with AWS, data-processing, apache-spark

Alt

Part 1 in this series of blog posts describes how to setup AWS with some basic security and then load data into S3. This post walks-through the process of setting up a Spark cluster on AWS and accessing our S3 data from within Spark.

A key part of my vision …


Continue reading

Building a Data Science Platform for R&D, Part 1 - Setting-Up AWS

Posted on Tue 16 August 2016 in data-science • Tagged with AWS, data-processing

Alt

Here’s my vision: I get into the office and switch-on my laptop; then I start-up my Spark cluster; I interact with it via RStudio to exploring a new dataset a client uploaded overnight; after getting a handle on what I want to do with it, I prototype an ETL …


Continue reading