# My eRum 2018 biggest highlights

On the range of dates 14.-16. May 2018, the European R users meeting (eRum) was held in Budapest. I was there as an active participant since I had the presentation about time series data mining. The **eRum 2018** was a very successful event and I want to thank organizers of this event for a great organization of it.

This blog post will be oriented on my biggest highlights of the eRum conference and as a list of useful resources.

## Workshops

The eRum started with many workshops separated to 2 blocks and 7 parallel sessions (so together 14 workshops). It was difficult to choose 2 workshops from 14, in which I will sit because there were many interesting topics. I finally chose DALEX and Keras workshops.

#### DALEX - Descriptive mAchine Learning EXplanations

Great workshop by Przemyslaw Biecek and Mateusz Staniak about tools for exploration, validation, and explanation of complex machine learning models.

Fun on #workshop with DALEX to explain ML models at @erum2018 #erum2018 #rstats #dataviz #DataScience speaker: @smarterpoland pic.twitter.com/hHf9dThcr2

— Peter Laurinec (@petolauri) May 14, 2018

I learned many techniques for a diagnosis of machine learning models. Techniques for explanations of a trained model, predictions, single prediction etc. were all presented here. Workshop resources can be downloaded here:

Various packages were used for these purposes, the list of them follows:

#### Deep learning with Keras

The second workshop that I attended was about using Keras for deep learning by Aimee Gott and Douglas Ashton. It was a nice workshop about the basic usage of **Keras** library in **R**. We had got through the use cases with Iris dataset and time series dataset from accelerometer (used CNN for training). The materials can be downloaded from here:

## Conference talks

The second and the third day of the conference continued with keynote and invited talks, contributed talks and lightning talks. It was really motivating and inspirational to see all the R enthusiasts speak about their projects. It gives me more confidence to contribute to the R ecosystem or in the Data Science ecosystem in general. I will mention briefly 6 talks that were most fascinating to me.

The `recipes`

package by Edwin Thoen helps in preprocessing (creating) of design (model) matrices. By recipes, you can create effective preprocessing “pipeline” for your data.

The bombshell by Florian Privé was about using large matrices in R. He created `bigstats`

package for a parallel and fast manipulation of matrices with a larger size than RAM size.

The great keynote speech by Nathalie Vialaneix was about using unsupervised learning for relational data (or dissimilarity data). She talked about various interesting use cases to use her R packages `adjclust`

and `SOMbrero`

for clustering relational data. The slides can be found here: slides_villavialaneix_ERUM2018.

Unsupervised #learning for relational data, dissimilarities with #rstats packages adjclust and SOMbrero by @Natty_V2 #erum2018 #DataScience #MachineLearning

— Peter Laurinec (@petolauri) May 15, 2018

Great talk! pic.twitter.com/O73rzg7O1z

Afterward, Erin LeDell from H2O talked about automated ensemble learning using `h2o`

package. The `h2o.automl`

function allows various interesting things, for example, limit (restrict) learning time for a creation of ensemble.

The great machine learning session continued with a talk by Szilard Pafka. His benchmark repositories are well known in the ML community. He talked about gradient boosting frameworks (h2o.gbm, xgboost, lightGBM), and their pros and cons (see repo GBM-perf).

The next day was most interesting for me talk by Henrik Bengtsson about parallel computing in R. His `future`

package allows async parallel multiprocessing computing. It has many various useful applications, for example in shiny apps.

## TSrepr talk

As I mentioned in the beginning, I also gave a talk about my `TSrepr`

package. I talked about how to use time series representations to do better data mining in R. Slides are here:

The video of the talk:

You can read more about how to use time series representation methods in my previous blog posts:

All other talks can be seen on Budapest Users of R Network channel!