[School-of-data] Introduction -- Simon Cropper

Michael Bauer michael.bauer at okfn.org
Tue Jun 17 07:38:36 UTC 2014


Hi there,

On Mon, Jun 16, 2014 at 08:32:42AM +0100, Peter Murray-Rust wrote:
> > As I am exploring some new tools in Python, I have thought of doing this
> > analysis using Pandas or something similar. The code would be integrated
> > into iPython Notebooks so others could view the methodology and augment
> > where necessary, and managed in a GitHub repository.
> >
> >
> I'm not (yet?) an expert Pythonista but from the description of the problem
> it sounds like you will need multivariate statistical methods. There are
> lots of libraries - I would probably point you at R but Pandas points you
> at http://statsmodels.sourceforge.net. I would probably start with a
> Principal Components method to get an idea of the shape of the data - are
> there serious outliers, etc. and then move to classification methods -
> supervised and unsupervised, binary and multiple. You're almost certainly
> going to have to deal with missing data .

A while back we were thinking about introducing a more advanced framework
for everyone who gets bored playing with spreadsheets ;) We were debating
on R vs. Python (Although I'm a python programmer I did most of my data
work in R (pandas didn't exist when I started out)). Would you want to
write a short introduction on python/pandas. What you need to start out and
where to find further resources?

Michael

-- 
Data Diva | skype: mihi_tr | @mihi_tr
Open Knowledge | School of Data
http://okfn.org | http://schoolofdata.org 
GPG/PGP key: http://tentacleriot.eu/mihi.asc



More information about the school-of-data mailing list