[School-of-data] Introduction -- Simon Cropper
Simon Cropper
simoncropper at fossworkflowguides.com
Mon Jun 16 14:37:18 UTC 2014
On 16/06/14 17:32, Peter Murray-Rust wrote:
>
[snip]
> I'm not (yet?) an expert Pythonista but from the description of the
> problem it sounds like you will need multivariate statistical methods.
> There are lots of libraries - I would probably point you at R but Pandas
> points you at http://statsmodels.sourceforge.net. I would probably start
> with a Principal Components method to get an idea of the shape of the
> data - are there serious outliers, etc. and then move to classification
> methods - supervised and unsupervised, binary and multiple. You're
> almost certainly going to have to deal with missing data .
Looking carefully at the type of values for each nutrient will be
important to understand the dataset. Which tools are used is really
dependant on the other people interested in the dataset and the type
of questions they are interested in.
In regards to missing values -- yes, this will be a problem. Some foods
will really not have a nutrient while others may just not have been
assessed. I'll need to look carefully at the type of information that is
available for each food type and the reliability of the data.
> Excellent - have you met up with any OKF in Australia - Melbourne is an
> active centre.
No. Point me to a forum, list or contact and I will see what they are up
to. To date, I have explored from the open source software side
of things. Open data is only just becoming available in Australia. Lots
of datasets are still locked up with restrictive licensing and Data
Supply Agreement's.
[snip]
> Peter Murray-Rust
> Reader in Molecular Informatics
> Unilever Centre, Dep. Of Chemistry
> University of Cambridge
> CB2 1EW, UK
> +44-1223-763069
--
Cheers Simon
Simon Cropper - Open Content Creator
Free and Open Source Software Workflow Guides
------------------------------------------------------------
Introduction http://www.fossworkflowguides.com
GIS Packages http://www.fossworkflowguides.com/gis
bash / Python http://www.fossworkflowguides.com/scripting
More information about the school-of-data
mailing list