[School-of-data] Introduction -- Simon Cropper

Simon Cropper simoncropper at fossworkflowguides.com
Mon Jun 16 14:37:18 UTC 2014

On 16/06/14 17:32, Peter Murray-Rust wrote:


> I'm not (yet?) an expert Pythonista but from the description of the
> problem it sounds like you will need multivariate statistical methods.
> There are lots of libraries - I would probably point you at R but Pandas
> points you at http://statsmodels.sourceforge.net. I would probably start
> with a Principal Components method to get an idea of the shape of the
> data - are there serious outliers, etc. and then move to classification
> methods - supervised and unsupervised, binary and multiple. You're
> almost certainly going to have to deal with missing data .

Looking carefully at the type of values for each nutrient will be
important to understand the dataset. Which tools are used is really
dependant on the other people interested in the dataset and the type
of questions they are interested in.

In regards to missing values -- yes, this will be a problem. Some foods 
will really not have a nutrient while others may just not have been 
assessed. I'll need to look carefully at the type of information that is 
available for each food type and the reliability of the data.

> Excellent - have you met up with any OKF in Australia - Melbourne is an
> active centre.

No. Point me to a forum, list or contact and I will see what they are up 
to. To date, I have explored from the open source software side
of things. Open data is only just becoming available in Australia. Lots 
of datasets are still locked up with restrictive licensing and Data 
Supply Agreement's.


> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069

Cheers Simon

    Simon Cropper - Open Content Creator

    Free and Open Source Software Workflow Guides
    Introduction               http://www.fossworkflowguides.com
    GIS Packages           http://www.fossworkflowguides.com/gis
    bash / Python    http://www.fossworkflowguides.com/scripting

More information about the school-of-data mailing list