[ddj] SQL Vs Excel Vs Refine

Michael Bauer michael.bauer at okfn.org
Mon Apr 29 07:17:55 UTC 2013


Andrew, Ed,

I'd like to differ here. There is one situation where you want to start
using a database: as soon as your data has relationships. 

What do I mean by this: e.g. you have a list of companies and employees,
you could duplicate the company records for each employee making a huge
table in excel - however it is more efficient to work with two tables in
SQL (and clear relationships there). Or where multiple companies can have
multiple owners (here it starts to get complicated) and multiple products.

While all of this can be done in excel at one point the workflow becomes
to complicated. This is where sql queries come in handy. 

I do run into these problems working with data quite often - if I need to
keep it at spreadsheet level I always end up having multiple sheets and
stuff tends to get complicated.

Michael

On Mon, Apr 29, 2013 at 04:10:23PM +1000, Andrew Duffy wrote:
> Thanks Ed. I'm getting pretty decent with scraping and regex, still
> wondering whether to dive into SQL though. From the surface it looks like
> Excel could do something pretty similar to the SELECT query. It seems SQL
> is only really needed for incredibly large datasets?
> 
> 
> On Mon, Apr 29, 2013 at 3:46 PM, M. Edward (Ed) Borasky <znmeb at znmeb.net>wrote:
> 
> > I never learned Refine but I did learn SQL. There's a whole lot to
> > learn in SQL, but for data journalism, just being able to do a single
> > SELECT query against a single table may be all you ever need. Really,
> > I'd learn scraping and regular expressions before I went to SQL, given
> > that Excel can handle millions of rows.
> >
> > On Sun, Apr 28, 2013 at 9:37 PM, Andrew Duffy
> > <andrewjamesduffy at gmail.com> wrote:
> > > Question:
> > >
> > > Are there any data journalists/devs out there that can advise as to
> > whether
> > > it's worth learning SQL? So far a combination of Excel/Google Refine has
> > > been more than enough for dumping, organising, and cleaning my data
> > > projects, but I have only worked with spreadsheets up to ~500 rows.
> > >
> > > What can SQL do that refine/excel can't?
> > >
> > > --
> > >
> > > Andrew Duffy - Journalist
> > >
> > >
> > >
> > > _______________________________________________
> > > data-driven-journalism mailing list
> > > data-driven-journalism at lists.okfn.org
> > > http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> > > Unsubscribe:
> > http://lists.okfn.org/mailman/options/data-driven-journalism
> > >
> >
> >
> >
> > --
> > Twitter: http://twitter.com/znmeb; Computational Journalism Publishers
> > Workbench
> > http://j.mp/CompJournBench/
> >
> > Get out of the building - and don't come back till you have the order!
> >
> > _______________________________________________
> > data-driven-journalism mailing list
> > data-driven-journalism at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> > Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
> >
> 
> 
> 
> -- 
> 
> *Andrew Duffy - Journalist***
> 
> 0439 972 041
> 
> Reed Business Information

> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism


-- 
Data Wrangler with the Open Knowledge Foundation (OKFN.org)
GPG/PGP key: http://tentacleriot.eu/mihi.asc
Twitter: @mihi_tr Skype: mihi_tr




More information about the data-driven-journalism mailing list