[School-of-data] BIG data... What is BIG?

Tom Longley tom at tacticaltech.org
Wed Jul 9 19:53:02 UTC 2014


Might GDELT fit the bill here?

"The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Its Event Database archives contain nearly 400M latitude/longitude geographic coordinates spanning over 12,900 days, additionally making it one of the largest open-access spatio-temporal datasets in existance. It truly pushes the boundaries of "big data," weighing in at over a quarter-billion rows with 59 fields for each record, spanning the geography of the entire planet, and covering a time horizon of more than 35 years. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?”
(http://www.gdeltproject.org/data.html)

The terms of use are ‘open’:

"The GDELT Project is an open platform for research and analysis of global society and thus all datasets released by the GDELT Project are available for unlimited and unrestricted use for any academic, commercial, or governmental use of any kind without fee.”
(http://www.gdeltproject.org/about.html#termsofuse)

Cheers,

Tom

On 9 Jul 2014, at 20:46, Julian Tait <julian at thegarden.io> wrote:

> Hi Stefan,
> 
> I think the data that is being released by the National Health Service in the UK around prescribed medicines could be considered BIG - over 1GB per month, approx 4 million lines. http://www.hscic.gov.uk/searchcatalogue?productid=12419&topics=0%2fPrescribing&infotype=0%2fOpen+data&sort=Relevance&size=10&page=1#top with the potential to help solve BIG problems such as the prevalence of proprietary medication over generic - a big cost for the public health service or whether certain areas are anomalous in the way that anti-depressants are used etc.
> 
> Cheers
> 
> Julian
> 
> 9 Jul 2014, at 20:25, Stefan Urbanek wrote:
> 
>> 
>> On Jul 9, 2014, at 12:55 PM, Laura James <laura.james at okfn.org> wrote:
>> 
>>> On 9 July 2014 16:59, Stefan Urbanek <stefan.urbanek at gmail.com> wrote:
>>> 
>>> Concerning Open-Data, I yet have to see a dataset AND a problem to say that it is a BIG data problem. If you know about any, I would be more than happy to know about it. I would love to see one and touch it.
>>> 
>>> Depending on the definition of big data, something like the Ensembl human genome data? (210GB annotated version)
>>> 
>>> I imagine there may also be some pretty large NASA-generated open data sets. 
>>> 
>> 
>> Ah, you are right. thanks. I didn’t counted them in, because I think of them more as “scientific data” that happen to be open. Therefore I would rephrase myself as “I yet have to see a non-scientific open-data dataset(s) and a problem that is a big data problem” :-)
>> 
>> Stefan
>> 
>> _______________________________________________
>> school-of-data mailing list
>> school-of-data at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/school-of-data
>> Unsubscribe: https://lists.okfn.org/mailman/options/school-of-data
> 
> _______________________________________________
> school-of-data mailing list
> school-of-data at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/school-of-data
> Unsubscribe: https://lists.okfn.org/mailman/options/school-of-data

--
Tom Longley
Associate
Tactical Technology Collective
e: tom at tacticaltech.org
w: www.tacticaltech.org
t: @tlongers

Secure 'Off the Record' chat: longley at jabber.ccc.de
PGP fingerprint: 9EAC 193F 5AEA 2E09 DB81 FB46 C640 9B6F 0640 DC3E

Our new book is out: http://visualisingadvocacy.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.okfn.org/pipermail/school-of-data/attachments/20140709/3e42be9a/attachment-0002.sig>


More information about the school-of-data mailing list