[ddj] SQL Vs Excel Vs Refine

David F. Flanders david.flanders at ands.org.au
Mon Apr 29 22:02:15 UTC 2013


As a side note, is anyone using SPARQL to join data together for stories?

/flanders
On Apr 30, 2013 2:15 AM, "Eric Sagara" <esagara at gmail.com> wrote:

> Hello all,
>
> A lot of people raised some good points for learning SQL and I just wanted
> to point out some other reasons aside from working with large/relational
> databases.
>
> A lot of programs use their own flavor of SQL to query and manipulate
> data. A few that come immediately to mind are Microsoft Access, ArcGIS and
> QGIS.
> Knowing a little SQL ( and it really doesn't take much) will go a long way
> as you start to pick up new tools.
>
> Additionally it helps to know SQL when looking at how other people are
> working with data that you may have an interest in. I've seen SQL queries
> mentioned in the methodologies of studies which can help determine the
> validity of a study.  They have also come in handy while negotiating for
> data from government agencies.
>
> Hope that helps,
>
> Eric Sagara
>
> Sent from my iPhone
>
> On Apr 29, 2013, at 10:47 AM, Marcelo Fernandes <
> marcelo.fernandes at infoglobo.com.br> wrote:
>
> Hi Andrew et al.,
>
> When we work with relational databases and a language like SQL your
> decision is to keep your data without redundancy and in a standard format.
> We do that in order to easly deal with the data in the sense of not only
> querying but also change existing data and adding new ones. To better
> understand such philosophy please start reading
> https://en.wikipedia.org/wiki/Database_normalization
>
> Working of spreadsheets and data manipulation tools such as Google Refine
> you are just creating projections or generating views of the existing data.
> Although what these tools do is somehow similar no SQL there is a
> fundamental difference which is the way you store/manage data. It's way
> easier and faster to deal with SQL than dealing with a plenty of distinct
> file types and formats.
>
> One good approach that might help you decide about using a relational
> database or not would be creating a conceptual model of the information you
> plan to deal with. Depending on the number of entities/concepts, their
> relationships and the questions you want to be asked you can make a well
> informed decision. It's not just a matter of the size (in MegaBytes or
> numer of rows) of you data. The parameters to make this decision are in a
> higher abstration level. This book http://goo.gl/5Wvxr seems to talk
> about it.
>
> A step further would be dealing with really big quantity of data. That
> would require a Data Warehouse<http://en.wikipedia.org/wiki/Data_warehouse>approach or a so called Big
> Data <http://en.wikipedia.org/wiki/Big_data>, both with their own
> specific way of modeling and dealing with data. For example, Data
> Warehouses deal with non normalized data because data in this approach
> would be stored optmized for queries since change is not welcome.
>
> Among all infomormation you've been discussing in this thread I don't see
> how Semantic Web, SPARQL and Triple Stores could help you. Ok, it's cool to
> know at least a little bit of them but they are all about sharing and
> linking data amonst different but related conceptual models / semantics.
> Just don't follow this path right now, I suggest.
>
> I hope it helps!
>
>
> ---
> Marcelo Carvalho Fernandes
> Consultoria e Arquitetura de TI
> Infoglobo Comunicação e Participações S.A.
> +55 21 2534-9718
>
>
> 2013/4/29 <SMachlis at computerworld.com>
>
>> What I like about SQL:
>>
>> = It was *designed* for subsetting, slicing and dicing data. Yes, I can
>> do this to a large degree with Excel and Google Refine; but with a more
>> complex project -- especially as others have pointed out, with data having
>> one or more relationships between multiple tables -- there are times that I
>> find that using a tool designed for the job to be less frustrating and
>> considerably more robust.
>>
>> = If I am dealing with a large data set that is already in multiple
>> tables, SQL makes more sense to be than trying to shoehorn that data into
>> an Excel-friendly format.
>>
>> = It helps me think about data in a more structured way, which is very
>> useful when I've got projects where I'm collecting and storing my own data.
>>
>> = It helps me understand what sorts of data I can and can't reasonably
>> request from government agencies that store their data in structured
>> databases.
>>
>> = If I am sharing data with colleagues, sometimes it's useful to be able
>> to put up a simple PHP/MySQL app on our intranet (Rails or Jango might be a
>> better choice for this, but the shared internal server I have access to
>> does not include those platforms). Even if I'm creating a Web application
>> with a third-party service such as Caspio, I find it helpful to be able to
>> think about data in relational terms.
>>
>> = Having a series of SQL commands I can store in a file makes it easier
>> for me or others to go back and check my work, versus a series of Excel
>> point-and-click operations (or even multiple macros buried in Excel).
>>
>> Sharon Machlis
>>
>> ________________________________________
>> From: data-driven-journalism-bounces at lists.okfn.org [
>> data-driven-journalism-bounces at lists.okfn.org] On Behalf Of Andrew Duffy
>> [andrewjamesduffy at gmail.com]
>> Sent: Monday, April 29, 2013 12:37 AM
>> To: data-driven-journalism at lists.okfn.org
>> Subject: [ddj] SQL Vs Excel Vs Refine
>>
>> Question:
>>
>> Are there any data journalists/devs out there that can advise as to
>> whether it's worth learning SQL? So far a combination of Excel/Google
>> Refine has been more than enough for dumping, organising, and cleaning my
>> data projects, but I have only worked with spreadsheets up to ~500 rows.
>>
>> What can SQL do that refine/excel can't?
>>
>> --
>>
>> Andrew Duffy - Journalist
>>
>>
>>
>> _______________________________________________
>> data-driven-journalism mailing list
>> data-driven-journalism at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
>> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>>
>
> - - - - - - - - - - - - - - - - - - - - AVISO IMPORTANTE / IMPORTANT NOTICE - - - - - - - - - - - - - - - - - - - - - -
>
> Esta mensagem pode conter informações confidenciais e somente o indivíduo ou entidade a quem foi destinada pode
> utilizá-la. A transmissão incorreta da mensagem não acarreta a perda de sua confidencialidade. Caso esta mensagem
> tenha sido recebida por engano, solicitamos que o fato seja comunicado ao remetente e que a mensagem seja eliminada
> de seu sistema imediatamente. É vedado a qualquer pessoa que não seja o destinatário usar, revelar, distribuir
> ou copiar qualquer parte desta mensagem. Ambiente de comunicação sujeito a monitoramento.
>
>
>
> This message may include confidential information and only the intended addressee have the right to use it as is, or
> any part of it. A wrong transmission does not break its confidentiality. If you've received it because of a mistake or
> erroneous transmission, please notify the sender and delete it from your system immediately. This communication environment
> is controlled and monitored.
>
>
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
>
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>
>
> _______________________________________________
> data-driven-journalism mailing list
> data-driven-journalism at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/data-driven-journalism
> Unsubscribe: http://lists.okfn.org/mailman/options/data-driven-journalism
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/data-driven-journalism/attachments/20130430/d676fe47/attachment-0001.html>


More information about the data-driven-journalism mailing list