[ddj] SQL Vs Excel Vs Refine

Eric Sagara esagara at gmail.com
Mon Apr 29 15:53:24 UTC 2013


Hello all,

A lot of people raised some good points for learning SQL and I just wanted to point out some other reasons aside from working with large/relational databases.

A lot of programs use their own flavor of SQL to query and manipulate data. A few that come immediately to mind are Microsoft Access, ArcGIS and QGIS. 
Knowing a little SQL ( and it really doesn't take much) will go a long way as you start to pick up new tools. 

Additionally it helps to know SQL when looking at how other people are working with data that you may have an interest in. I've seen SQL queries mentioned in the methodologies of studies which can help determine the validity of a study.  They have also come in handy while negotiating for data from government agencies.

Hope that helps,

Eric Sagara

Sent from my iPhone

On Apr 29, 2013, at 10:47 AM, Marcelo Fernandes <marcelo.fernandes at infoglobo.com.br> wrote:

> Hi Andrew et al.,
> 
> When we work with relational databases and a language like SQL your decision is to keep your data without redundancy and in a standard format. We do that in order to easly deal with the data in the sense of not only querying but also change existing data and adding new ones. To better understand such philosophy please start reading https://en.wikipedia.org/wiki/Database_normalization
> 
> Working of spreadsheets and data manipulation tools such as Google Refine you are just creating projections or generating views of the existing data. Although what these tools do is somehow similar no SQL there is a fundamental difference which is the way you store/manage data. It's way easier and faster to deal with SQL than dealing with a plenty of distinct file types and formats.
> 
> One good approach that might help you decide about using a relational database or not would be creating a conceptual model of the information you plan to deal with. Depending on the number of entities/concepts, their relationships and the questions you want to be asked you can make a well informed decision. It's not just a matter of the size (in MegaBytes or numer of rows) of you data. The parameters to make this decision are in a higher abstration level. This book http://goo.gl/5Wvxr seems to talk about it.
> 
> A step further would be dealing with really big quantity of data. That would require a Data Warehouse approach or a so called Big Data, both with their own specific way of modeling and dealing with data. For example, Data Warehouses deal with non normalized data because data in this approach would be stored optmized for queries since change is not welcome.
> 
> Among all infomormation you've been discussing in this thread I don't see how Semantic Web, SPARQL and Triple Stores could help you. Ok, it's cool to know at least a little bit of them but they are all about sharing and linking data amonst different but related conceptual models / semantics. Just don't follow this path right now, I suggest.
> 
> I hope it helps!
> 
> 
> ---
> Marcelo Carvalho Fernandes
> Consultoria e Arquitetura de TI
> Infoglobo Comunicação e Participações S.A.
> +55 21 2534-9718
> 
> 
> 2013/4/29 <SMachlis at computerworld.com>
>> What I like about SQL:
>> 
>> = It was *designed* for subsetting, slicing and dicing data. Yes, I can do this to a large degree with Excel and Google Refine; but with a more complex project -- especially as others have pointed out, with data having one or more relationships between multiple tables -- there are times that I find that using a tool designed for the job to be less frustrating and considerably more robust.
>> 
>> = If I am dealing with a large data set that is already in multiple tables, SQL makes more sense to be than trying to shoehorn that data into an Excel-friendly format.
>> 
>> = It helps me think about data in a more structured way, which is very useful when I've got projects where I'm collecting and storing my own data.
>> 
>> = It helps me understand what sorts of data I can and can't reasonably request from government agencies that store their data in structured databases.
>> 
>> = If I am sharing data with colleagues, sometimes it's useful to be able to put up a simple PHP/MySQL app on our intranet (Rails or Jango might be a better choice for this, but the shared internal server I have access to does not include those platforms). Even if I'm creating a Web application with a third-party service such as Caspio, I find it helpful to be able to think about data in relational terms.
>> 
>> = Having a series of SQL commands I can store in a file makes it easier for me or others to go back and check my work, versus a series of Excel point-and-click operations (or even multiple macros buried in Excel).
>> 
>> Sharon Machlis
>> 
>> ________________________________________
>> From: data-driven-journalism-bounces at lists.okfn.org [data-driven-journalism-bounces at lists.okfn.org] On Behalf Of Andrew Duffy [andrewjamesduffy at gmail.com]
>> Sent: Monday, April 29, 2013 12:37 AM
>> To: data-driven-journalism at lists.okfn.org
>> Subject: [ddj] SQL Vs Excel Vs Refine
>> 
>> Question:
>> 
>> Are there any data journalists/devs out there that can advise as to whether it's worth learning SQL? So far a combination of Excel/Google Refine has been more than enough for dumping, organising, and cleaning my data projects, but I have only worked with spreadsheets up to ~500 rows.
>> 
>> What can SQL do that refine/excel can't?
>> 
>> --
>> 
>> Andrew Duffy - Journalist
>> 
>> 
>> 
>> _______________________________________________
>> data-driven-journalism mailing list
>> data-driven-journalism at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
> 
> - - - - - - - - - - - - - - - - - - - - AVISO IMPORTANTE / IMPORTANT NOTICE - - - - - - - - - - - - - - - - - - - - - -
> 
> Esta mensagem pode conter informações confidenciais e somente o indivíduo ou entidade a quem foi destinada pode 
> utilizá-la. A transmissão incorreta da mensagem não acarreta a perda de sua confidencialidade. Caso esta mensagem 
> tenha sido recebida por engano, solicitamos que o fato seja comunicado ao remetente e que a mensagem seja eliminada 
> de seu sistema imediatamente. É vedado a qualquer pessoa que não seja o destinatário usar, revelar, distribuir 
> ou copiar qualquer parte desta mensagem. Ambiente de comunicação sujeito a monitoramento. 
> 
> 
> 
> This message may include confidential information and only the intended addressee have the right to use it as is, or 
> any part of it. A wrong transmission does not break its confidentiality. If you've received it because of a mistake or 
> erroneous transmission, please notify the sender and delete it from your system immediately. This communication environment 
> is controlled and monitored. 
> 
> 
> 
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130429/6170ca9f/attachment-0001.html>


More information about the data-driven-journalism mailing list