[School-of-data] Introduction -- Simon Cropper

Michael Bauer michael.bauer at okfn.org
Thu Jun 19 08:51:03 UTC 2014


Simon,

On Thu, Jun 19, 2014 at 02:11:06PM +1000, Simon Cropper wrote:
> Michael,
> 
> Sure, I would be interested. I have the bulk of this information at my
> fingertips at the present having done a comprehensive review of the
> resources available for myself. I have also found some documentation on how
> to integrate R with Pandas, and also how to do stuff in Pandas the same way
> as R.

I'd target this on the School of Data blog. For people who know a bit how
to program and want to get into data analysis more. Reviewing the options,
linking to additional resources (such as your book reviews on the topic)
pointing out strengths and weaknesses etc.

As for time-lines: Whatever suits you best.

Michael

> 
> I have also been actively reviewing the books available on the topic. Anyone
> interested can see my latest reviews here --
> http://www.simonchristophercropper.com/TechnicalReviews.html
> I am currently reviewing the soon to be published book on "Python for
> Finance" which utilizes Pandas to analyze financial data.
> 
> Where did you expect to publish this information? Who is the target
> audience. What time-lines have you in mind?
> 
> On 17/06/14 17:38, Michael Bauer wrote:
> >Hi there,
> >
> >On Mon, Jun 16, 2014 at 08:32:42AM +0100, Peter Murray-Rust wrote:
> >>>As I am exploring some new tools in Python, I have thought of doing this
> >>>analysis using Pandas or something similar. The code would be integrated
> >>>into iPython Notebooks so others could view the methodology and augment
> >>>where necessary, and managed in a GitHub repository.
> >>>
> >>>
> >>I'm not (yet?) an expert Pythonista but from the description of the problem
> >>it sounds like you will need multivariate statistical methods. There are
> >>lots of libraries - I would probably point you at R but Pandas points you
> >>at http://statsmodels.sourceforge.net. I would probably start with a
> >>Principal Components method to get an idea of the shape of the data - are
> >>there serious outliers, etc. and then move to classification methods -
> >>supervised and unsupervised, binary and multiple. You're almost certainly
> >>going to have to deal with missing data .
> >
> >A while back we were thinking about introducing a more advanced framework
> >for everyone who gets bored playing with spreadsheets ;) We were debating
> >on R vs. Python (Although I'm a python programmer I did most of my data
> >work in R (pandas didn't exist when I started out)). Would you want to
> >write a short introduction on python/pandas. What you need to start out and
> >where to find further resources?
> >
> >Michael
> >
> 
> -- 
> Cheers Simon
> 
>    Simon Cropper - Open Content Creator
> 
>    Free and Open Source Software Workflow Guides
>    ------------------------------------------------------------
>    Introduction               http://www.fossworkflowguides.com
>    GIS Packages           http://www.fossworkflowguides.com/gis
>    bash / Python    http://www.fossworkflowguides.com/scripting
> _______________________________________________
> school-of-data mailing list
> school-of-data at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/school-of-data
> Unsubscribe: https://lists.okfn.org/mailman/options/school-of-data

-- 
Data Diva | skype: mihi_tr | @mihi_tr
Open Knowledge | School of Data
http://okfn.org | http://schoolofdata.org 
GPG/PGP key: http://tentacleriot.eu/mihi.asc



More information about the school-of-data mailing list