[open-humanities] OKFest Session "Open Data and the Panton Principles for the Humanities"

John Levin john at anterotesis.com
Sun Jul 27 10:05:16 UTC 2014


Dear everyone, & James especially,

I think the question here is how much the original source is changed by 
the author, whether it is turned into data and the methodology and tools 
used for that.

In the example given, JHS is writing an article about death in Hamlet 
through a close reading of the text. If he was writing about death via a 
more textual/linguistic analysis, using a concordance for example, I 
think the concordance could be considered the data, and should be 
released or at least the steps to reproduce the concordance should be 
released. (This could be as simple as a url for a search for the word 
'death' conducted on Hamlet @ Open Shakespeare.) It's not just about the 
original data, but also methods and tools used to examine it.

Enlarging the example, say to an article on death in Elizabethan plays, 
a corpus of texts would be created; that would be data, and certainly 
requires release in some form - as a bibliography rather than all the 
texts themselves, but nevertheless this is important data.

For history, I believe that in the course of research the historian 
produces all sorts of important information in forms that do not enter 
the "final printed product." I find that I produce a lot of useful data, 
such as chronologies. However, this sort of supporting material is not 
generally released by historians today. (I will be releasing such 
material as I produce for my PhD.)

I also act upon the data sources I am using - for example, extracting 
names of debtors from the London Gazette, tabulating them, modernising 
spellings, making place names clear and geo-locating them. This is a 
covers methodology (the way in which I work up the data from the 
digitized sources), the derived data produced from the raw source, and 
the code used to manipulate / examine the data.

(And again, these lists will be released by me under a free and open 
license.)

These are examples of the data that humanists produce and could/should 
release.

Not every humanistic work produces data, as James' case study shows. But 
I would guess most do. And does every scientific work do so? What of an 
analysis of a single set of previously published data, checking for 
consistency?

There's grey areas with regard to the  Panton Principles, for example on 
graphs and tables:
http://pantonprinciples.org/faq/#Q16_What_sort_of_material_is_data_Can_graphs_tables_etc_be_marked_as_Open_Data

Personally, I don't think we should get too hung up on definitions and 
trying to catch every eventuality. It's enough to say the text, the data 
and the tools.

John


On 25/07/2014 13:17, James Harriman-Smith wrote:
> Hi everyone,
>
> It's taken me a little while, but I've finally had a chance to go
> through the notes from the session. They make for a very interesting
> read. One thing in particular caught my eye, and it was the difficulty
> over using the term 'open data' to describe the objects and products of
> research in the Humanities. This is something we tried to fudge with the
> word 'works' in the original drafts a few years back, but clearly needs
> to be addressed in detail.
>
> Taking up the suggestion in the notes to think about this on a
> discipline by discipline level, here's a little thought experiment,
> based on my own work as a student of literature (with a very rough
> science example in the background).
>
> - I write an article on death in Hamlet (a scientist might write an
> article on the common cold in humans)
> - Data? My observations on the play, in the form of chosen quotations,
> notes, and selections from criticism (the scientist's observations on
> patients, supported with a reading of relevant recent research)
> - Note that my data is *not* the entire play, this would be the
> equivalent of the scientist's human subject
> - In other words, Hamlet is *a source of data* for my work
>
> Moving on to publication
> - When submitting his article, the scientist also publishes the
> (anonymised) dataset he has collected on the common cold in line with
> the Panton Principles, allowing his data to be reused and verified by others
> - I submit my article, and also publish openly... what? my notes and
> chosen quotations will already be in my article, so there won't be much
> left over.
> - I could publish all the material behind my article, notes and
> quotations: but, in making this readable, have I not just rewritten my
> article in the public domain?
> - I suppose just an anthology of quotations (providing all my sources
> were PD) might be a useful thing to have available, but the presence of
> such a thing would not save anyone following me from having to reread
> the sources in full anyway...
>
> What can we conclude from this little experiment for literature?
> - There is a distinction between the researcher's dataset and his or her
> source.
> - The easiest way to make the dataset open would be pushing for open
> access to the researcher's article, since the dataset will be heavily
> embedded in it
> - An alternative option would be open publication of a collection of all
> the quotations, etc. used in the article, but this would involve a
> change in academic practice, the use of PD sources only, and is of
> limited utility for future researchers anyway.
>
> Comments? Questions? Perhaps someone from another discipline could try
> something similar? @John - what would this look like for a historian?
>
> J
>
>
>
> On 22 July 2014 11:55, Peter Kraker <peter.kraker at tugraz.at
> <mailto:peter.kraker at tugraz.at>> wrote:
>
>     Dear all,
>
>     James Harriman-Smith asked me to post the results of the OKFest
>     Session on "Open Data and the Panton Principles for the Humanities"
>     to this list
>
>     The session was well attended, and with Peter Murray-Rust we had one
>     of the original authors in the audience. I first gave a short
>     introduction to Open Data and the Panton Principles which you can
>     find here: http://de.slideshare.net/__pkraker/opendata-humanities
>     <http://de.slideshare.net/pkraker/opendata-humanities>
>
>     Afterwards, we had an animated discussion on the issues of open data
>     in the humanities. Then, I asked the participants to go into groups
>     of 3 or 4 to come up with solutions to these issues. The discussion
>     and the solutions were captured by two participants in the Etherpad:
>     https://pad.okfn.org/p/Panton___Principles_for_the_Humanities
>     <https://pad.okfn.org/p/Panton_Principles_for_the_Humanities>
>
>     All in all, the session went very well. I hope that the outcomes are
>     useful for your adaptation of the Panton Principles to the
>     humanities. If you have any questions, please let me know!
>
>     Best,
>     Peter
>     _________________________________________________


-- 
John Levin
http://www.anterotesis.com
http://twitter.com/anterotesis




More information about the open-humanities mailing list