Contact | Print 

GSoC 2019

We reproduce here our call for a student to participate under our guidance to Google Summer Of Code 2019. The original text is published on a Github page under the R-Project umbrella. Students can apply from February 26, 2019 to April 9, 2019, 18:00 UTC as per the GSoC 2019 timeline.


Background

Most R packages that implement neural networks of perceptron type (one input layer, one normalized layer, one hidden layer with nonlinear activation function usually tanh(), one normalized layer, one output output layer) for regression purpose (i.e. NN(X1, ..., Xn) = E[Y], as opposite to classification) use very poor learning algorithm(s) and never find the global minimum of the objective function in the parameter space. Most of the time, a first order algorithm is used when neural networks, as any nonlinear function, require a second order algorithm.

In 2015, Patrice Kiener conducted a private benchmark on more than twenty R packages with a few known datasets. The result was a disaster. More than 18 packages did not converge correctly and only 2 packages found the right values. We feel that an updated and more formal evaluation should be realized and communicated to the whole R community. We therefore invite a student to apply for this new benchmark under our guidance and publish his results in one task view and in the R-Journal.

Related work

Such a benchmark has never been conducted on R packages so far. This work is acknowledged by AIM SIG who received some funding from the R-Consortium (section PSI application for collaboration to create online R package validation repository). Some connections can also be established with the histoRicalg project.

Details of your coding project

We expect a student with a sound knowledge of nonlinear regression algorithms (BFGS, Levenberg-Marquardt). The purpose of this work is (1) to benchmark 20 to 30 R packages with 3 to 5 simple datasets and (2) to write a comprehensive report on the performance of each package.

(1) A simple code to call each package, test them against the datasets and collect the results is to be written. This can lead to a meta-package to ease the benchmark procedure as well as to perform other benchmarks in the future.

(2) The biggest effort will be on writing the results in a nice report, if possible directly in the R-Journal format (which can be easily accessed through the "rticles" package). An introduction to both neural networks and optimization methods is expected.

Expected impact

With this work, we wish to alert R users about the varying performance of neural network packages. Users, both from academia and private companies, should be aware of the strengths and the weaknesses of the packages they use. We expect a bigger impact on package maintainers and authors and hope that such a benchmark will convince them to shift to better algorithms.

Neural networks are understood as black boxes, especially nowadays with the advent of machine learning and artifical intelligence procedures, but a minimum of care and a sound mathematical approach should be taken when writing an R package.

Mentors

Students, please contact mentors below after completing at least one of the tests below.

- Patrice Kiener (Email: gsoc2019@inmodelia.com) is the author of the R package FatTailsR and has 18 years of experience with neural networks of perceptron type.

- Christophe Dutang (Email: gsoc2019@inmodelia.com) has authored or contributed to more than 10 packages and maintains the task views related to Probability Distributions and Extreme Value Analysis. He also had previous GSoC experience with the markovchain package in 2015 and 2016.

Tests

Students, please do one or more of the following tests before contacting the mentors above.

- Can you explain the difference between first and second order algorithms?

- Can you cite a few books on this topic? Which one have you read? understood?

- Is back-propagation necessary for neural networks?

- Is back-propagation useful for neural networks? Why?

- How many local minima can you expect after regression?

- In case of several outputs, why is it better to have one model per output?

- Give a few examples of your R code.

- Have you ever contributed to an R package? if yes, which one?


If several students have an equal score after this first serie of questions, a few other (unpublished) questions will be asked.

Solution of tests

Students, please send your test results to Patrice Kiener (Email: gsoc2019@inmodelia.com).


List your answers to the various questions in the body of the email. For the R code, one attached -.R file at a maximum or one -.7z compressed file that concatenates a few -.R (-.Rmd) files.

© Copyright InModelia 2009 - 2018