[School-of-data] Python, pandas and ipython notebooks

Tue Jun 17 09:57:57 UTC 2014

FWIW, I've been putting together some notebooks around pandas for an in
production OU course.

We can't share all the notebooks atm (version & quality control issues!
Plus things are still in an early stage of development) but I'll try to
post fragments for comment...

The audience is computing students, so the tone and some of the exercises
reflect that...

Here's a draft of a section on pandas series and dataframe structures:
http://nbviewer.ipython.org/github/psychemedia/ou-tm351/blob/master/noteboo
ks-RFC/Pandas%20Intro%20-%20RFC.ipynb

The notebook is intended to be embedded within other materials and used as
a workbook (if you've ever used Stroud, you may feel the resemblance in
part!); which means reading and working through the activities, running
each cell as you come to it, then perhaps editing the cell you just ran
and running it again.

Any issues or comments, please add them to the tracker at
https://github.com/psychemedia/ou-tm351/issues

tony

[Michael - I'll try to do a notebook intro post for ScoDa blog this week]

On 17/06/2014 08:38, "Michael Bauer" <michael.bauer at okfn.org> wrote:

>Hi there,
>
>On Mon, Jun 16, 2014 at 08:32:42AM +0100, Peter Murray-Rust wrote:
>> > As I am exploring some new tools in Python, I have thought of doing
>>this
>> > analysis using Pandas or something similar. The code would be
>>integrated
>> > into iPython Notebooks so others could view the methodology and
>>augment
>> > where necessary, and managed in a GitHub repository.
>> >
>> >
>> I'm not (yet?) an expert Pythonista but from the description of the
>>problem
>> it sounds like you will need multivariate statistical methods. There are
>> lots of libraries - I would probably point you at R but Pandas points
>>you
>> at http://statsmodels.sourceforge.net. I would probably start with a
>> Principal Components method to get an idea of the shape of the data -
>>are
>> there serious outliers, etc. and then move to classification methods -
>> supervised and unsupervised, binary and multiple. You're almost
>>certainly
>> going to have to deal with missing data .
>
>A while back we were thinking about introducing a more advanced framework
>for everyone who gets bored playing with spreadsheets ;) We were debating
>on R vs. Python (Although I'm a python programmer I did most of my data
>work in R (pandas didn't exist when I started out)). Would you want to
>write a short introduction on python/pandas. What you need to start out
>and
>where to find further resources?
>
>Michael
>
>--
>Data Diva | skype: mihi_tr | @mihi_tr
>Open Knowledge | School of Data
>http://okfn.org | http://schoolofdata.org
>GPG/PGP key: http://tentacleriot.eu/mihi.asc
>_______________________________________________
>school-of-data mailing list
>school-of-data at lists.okfn.org
>https://lists.okfn.org/mailman/listinfo/school-of-data
>Unsubscribe: https://lists.okfn.org/mailman/options/school-of-data

-- The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302). The Open University is authorised and regulated by the Financial Conduct Authority.