[ddj] SQL Vs Excel Vs Refine

Marcelo Fernandes marcelo.fernandes at infoglobo.com.br
Mon Apr 29 14:47:45 UTC 2013


Hi Andrew et al.,

When we work with relational databases and a language like SQL your
decision is to keep your data without redundancy and in a standard format.
We do that in order to easly deal with the data in the sense of not only
querying but also change existing data and adding new ones. To better
understand such philosophy please start reading
https://en.wikipedia.org/wiki/Database_normalization

Working of spreadsheets and data manipulation tools such as Google Refine
you are just creating projections or generating views of the existing data.
Although what these tools do is somehow similar no SQL there is a
fundamental difference which is the way you store/manage data. It's way
easier and faster to deal with SQL than dealing with a plenty of distinct
file types and formats.

One good approach that might help you decide about using a relational
database or not would be creating a conceptual model of the information you
plan to deal with. Depending on the number of entities/concepts, their
relationships and the questions you want to be asked you can make a well
informed decision. It's not just a matter of the size (in MegaBytes or
numer of rows) of you data. The parameters to make this decision are in a
higher abstration level. This book http://goo.gl/5Wvxr seems to talk about
it.

A step further would be dealing with really big quantity of data. That
would require a Data Warehouse
<http://en.wikipedia.org/wiki/Data_warehouse>approach or a so called
Big
Data <http://en.wikipedia.org/wiki/Big_data>, both with their own specific
way of modeling and dealing with data. For example, Data Warehouses deal
with non normalized data because data in this approach would be stored
optmized for queries since change is not welcome.

Among all infomormation you've been discussing in this thread I don't see
how Semantic Web, SPARQL and Triple Stores could help you. Ok, it's cool to
know at least a little bit of them but they are all about sharing and
linking data amonst different but related conceptual models / semantics.
Just don't follow this path right now, I suggest.

I hope it helps!


---
Marcelo Carvalho Fernandes
Consultoria e Arquitetura de TI
Infoglobo Comunicação e Participações S.A.
+55 21 2534-9718


2013/4/29 <SMachlis at computerworld.com>

> What I like about SQL:
>
> = It was *designed* for subsetting, slicing and dicing data. Yes, I can do
> this to a large degree with Excel and Google Refine; but with a more
> complex project -- especially as others have pointed out, with data having
> one or more relationships between multiple tables -- there are times that I
> find that using a tool designed for the job to be less frustrating and
> considerably more robust.
>
> = If I am dealing with a large data set that is already in multiple
> tables, SQL makes more sense to be than trying to shoehorn that data into
> an Excel-friendly format.
>
> = It helps me think about data in a more structured way, which is very
> useful when I've got projects where I'm collecting and storing my own data.
>
> = It helps me understand what sorts of data I can and can't reasonably
> request from government agencies that store their data in structured
> databases.
>
> = If I am sharing data with colleagues, sometimes it's useful to be able
> to put up a simple PHP/MySQL app on our intranet (Rails or Jango might be a
> better choice for this, but the shared internal server I have access to
> does not include those platforms). Even if I'm creating a Web application
> with a third-party service such as Caspio, I find it helpful to be able to
> think about data in relational terms.
>
> = Having a series of SQL commands I can store in a file makes it easier
> for me or others to go back and check my work, versus a series of Excel
> point-and-click operations (or even multiple macros buried in Excel).
>
> Sharon Machlis
>
> ________________________________________
> From: data-driven-journalism-bounces at lists.okfn.org [
> data-driven-journalism-bounces at lists.okfn.org] On Behalf Of Andrew Duffy [
> andrewjamesduffy at gmail.com]
> Sent: Monday, April 29, 2013 12:37 AM
> To: data-driven-journalism at lists.okfn.org
> Subject: [ddj] SQL Vs Excel Vs Refine
>
> Question:
>
> Are there any data journalists/devs out there that can advise as to
> whether it's worth learning SQL? So far a combination of Excel/Google
> Refine has been more than enough for dumping, organising, and cleaning my
> data projects, but I have only worked with spreadsheets up to ~500 rows.
>
> What can SQL do that refine/excel can't?
>
> --
>
> Andrew Duffy - Journalist
>
>
>
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>

- - - - - - - - - - - - - - - - - - - - AVISO IMPORTANTE / IMPORTANT NOTICE - - - - - - - - - - - - - - - - - - - - - -

Esta mensagem pode conter informações confidenciais e somente o indivíduo ou entidade a quem foi destinada pode 
utilizá-la. A transmissão incorreta da mensagem não acarreta a perda de sua confidencialidade. Caso esta mensagem 
tenha sido recebida por engano, solicitamos que o fato seja comunicado ao remetente e que a mensagem seja eliminada 
de seu sistema imediatamente. É vedado a qualquer pessoa que não seja o destinatário usar, revelar, distribuir 
ou copiar qualquer parte desta mensagem. Ambiente de comunicação sujeito a monitoramento. 



This message may include confidential information and only the intended addressee have the right to use it as is, or 
any part of it. A wrong transmission does not break its confidentiality. If you've received it because of a mistake or 
erroneous transmission, please notify the sender and delete it from your system immediately. This communication environment 
is controlled and monitored. 



- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130429/5e20c0a0/attachment-0001.html>


More information about the data-driven-journalism mailing list