[openbiblio-dev] SWORD v2 dev

Mon Nov 7 16:00:40 UTC 2011

First, I think this is a good point to move discussions onto the list.
I have copied to openbiblio-dev.

Primavera - I wonder if you could respond on this issue; given your
work on the metadata guide, I think perhaps you would be well placed
to inform us of where another format such as bibjson may be useful -
and who are the people we might talk to that would be interested in
using bibjson?

Below is the content of a discussion yesterday / today between Peter
Murray-Rust, Jim Pitman, and myself:

On Mon, Nov 7, 2011 at 3:37 PM, Jim Pitman <pitman at stat.berkeley.edu> wrote:
> Richard, many thanks for your detailed reply, very instructive.
> Some reactions inline below.
>
> As Peter said in previous:
>
>> > > it highligts some of the community stuff including the centraily of JISC.
>
> and this is  what I am trying to get my head around.
>
>> The process has been pretty comprehensively documented in the blog on
>> http://swordapp.org and we have a paper coming out in d-lib soon which
>> describes the use cases that we have built sword around. (pre-print attached).
>
> Good. It would be helpful if there was a link to this documentation from your conference
> video. Or maybe there was and I missed it? Why not post the pre-print publicly?
>
>> You can also see the initial SWORD 2.0 white paper at: http://sword2depositlifecycle.jiscpress.org/
>
> I see this now. A basic question is whether this publication infrastructure which you praised so highly
> in your talk is available to is for BibJSON dev. Is it?  If so, then why not use it? I think that
> should be part of our plan.

We already have a blog on our website set up in which documents can be
edited. We can use the annotator to add notes if desired. At this
stage, nobody else has written on the blog but me. The logins to the
blog are OKF website logins - if people do not know their passwords, I
can reset them, but note this will change your password on all OKF
sites.

>> down at the moment, so PDF version attached).  This fulfills the purpose of
>> picking apart the previous efforts in this space (SWORD 1.3) and proposing
>> ways in which SWORD 2.0 could be better than it.  For BibJSON an equivalent
>> document might explore why BibTeX and/or other bibliographic formats are lacking, and what should be done about them.
>
> OK.  This does not have to be a long document. It could all be said in a page. The limitations of BibTeX and other
> bibliographic formats (except perhaps AMF) are so obvious they dont need much time to dismiss. As for what should be done
> about them, we are already doing it, and it would be helpful to have a living document where that was summarized, and which
> could eventually be submitted too e.g. to d-lib to explain why we are making this effort.
> There is a start on this in the original BibJSON spec, and Peter wrote an etherpad full of stuff on this while he visited me
> in Berkeley. But someone needs to pull this together into a suitable whitepaper.  That could be authored by e.g. me + Peter + Karen + Mark
> + yourself if interested.  I think you would bring a lot of experience to the effort. But I do think this doc needs to be written very
> soon, and someone has to take responsibility for pulling all the stuff from different places and putting it together.
> I would be glad to make a first pass at this. But I would like to know first how to organize the collab doc prep with the JISC tool,
> and decide if we want to do that or just use googledocs, or other framework.  We have started in etherpad, but that is very scrappy.

As above, we could do this on the website already.

>> > Further Richard, are you available to lead a comparable community dev
>> > effort for the BibJSON spec?  And if not Richard, then who? ....
>>
>> The key thing to understand is that there are actually 2 community roles in
>> the SWORD project.  I operate as Technical Lead, and it is therefore my job
>> to work with the community in extracting, analysing and discussing
>> requirements, and ultimately having the power to make decisions which I
>> feel best meet those needs (for which I sometimes get shouted at, but that's part of the process).
>
> I see that Mark and I are operationally assuming this role for BibJSON.
>
>> The second person is Stuart Lewis, a long term collaborator of mine at the University of Auckland, and he is the
>> Community Manager for SWORD, with a remit to manage the website and the
>> blog and to lead on other communications efforts such as report writing,
>> journal article writing, generally being prominent on the mailing lists in
>> a non-technical capacity, and setting development challenges for
>> conferences, and so on.
>
> Mark is doing this right now.
>
>> Neither of us would have the skills or the capacity to do both jobs.
>
> Which explain why it may be too much to expect Mark alone to do both jobs for BibJSON, not to mention also
> BibServer dev.

Yes.

>> In addition, we have been supported by SONEX in the form of Pablo de Castro, who has done sterling work in use case
>> gathering and documentation (see the attached paper).
>
> Here we can get help from others, e.g. I hope Karen or others on the openbiblio list might be able to contribute to
> use case development.  Right now we dot need any more use cases. We have about 4 or 5 fundamental use  cases which
> we know very well and which should be driving development. Rufus has insisted on expanding this to 20 use cases which I think
> is crazy and distracting from the central purpose. But Rufus has nothing more than a spread sheet, with one row per use case.
> We need to be taking the 4 or 5 fundamental ones and fleshing them out completely with end-to-end BibServer applications, which
> will in turn drive the requirements for BibJSON. This has already started to occur, but not as systematically as it should.
>
>> So, what I'm saying is that I'd be happy to explore the possibility of operating as the technical lead for BibJSON, and I could take you through
>> the process of use case gathering, analysis and community engagement,
>
> I'd definitely like to pursue this further. You should be working this out with Mark, more than me, as it is
> the two of you who will have to work most closely together on this. It seems that Mark is not keen on working on the BibJSON spec, and
> it would be better to have someone with enthusiasm for spec writing taking the lead on it.

I think our current use cases are actually all use cases for bibserver
rather than bibjson. None of the small group or departmental use cases
actually require adherence to a spec. This is one of the reasons I am
no longer keen on the bibjson spec - we have already specified some
keys to use, and how to use namespaces. It is not obvious to me that
anything else is required. Sure, there may be convention on uses of
further keys, but those conventions should come about by use - between
individuals / groups actually agreeing a need for a particular key -
rather than by spec dictation.

>> but to do the full SWORD-like approach would require additional high-level non-technical input too,
>
> you will get plenty of that from me, possibly Peter, Karen and others I hope. We have to engage and inspire others to contribute.
>
>> and some cost in staff time, resources and travel (SWORD 2.0 has been done for about £80k, and I think JISC got a great deal).
>
> We will have to keep the costs down, but we have a JISC budget for the year, and some fraction of it should be spent on BibJSON/BibSoup/BKN
> community dev. Some of this is technical community dev, which you are well positioned to do, and some is more political.
> If you can make the case for more funding for this activity, then go for it, and lets seek additional funding. There are lots of potential
> sources. But first the case has to be made.
>
>> > In any case, I would like to see a list of milestones along the road of
>> > BibJSON dev.
>> > It is important that BibJSON dev not be tied too closely to the
>> > python-based BibServer,
>> > so python BibServer is seen as just one of many possible tools which could
>> > be developed over the
>> > BibJSON data model.  We should encourage the development of BibServer
>> > clones in frameworks
>> > like Drupal, Plone, Google App Engine, RoR, or whatever people want to
>> > develop it in.
>> > Anything that consumes and emits BibJSON with a suitable license for reuse
>> > is progress towards the
>> > goal of enlarging the BibSoup of all openly accessible and reusable
>> > BibJSON records.
>> >
>> > It seems the relation between
>> >
>> > SWORD and various repo data management systems DSpace, Eprints, Fedora ,
>> > ...
>> >
>> > is largely paralleled by the relation between
>> >
>> > BibJSON and various biblio data management systems (BibSonomy, CiteUlike,
>> > Mendeley, Google Scholar, MAS, Sciencecard, RePEc, Zotero, BibDesk, JabRef,
>> > ...... )
>> > except that in the biblio space there is nothing as general as DSpace,
>> > Eprints, Fedora , ... with many institutional implementations, and there is
>> > some unavoidable
>> > conflation in the above list between biblio data management systems and
>> > particular instances thereof.
>> >
>> > It seems we have parallel socio/technical issues in persuading/goading the
>> > various data management systems towards support and use of BibJSON.
>> > If there are important differences, we should understand them, and modify
>> > our community strategy accordingly.
>> > I think technically we are on the right track, regardless of how quickly
>> > we gain recognition/support/adoption from the big
>> > biblio data suppliers.
>> >
>> > Richard, I would be particularly interested in your thoughts on these
>> > parallels, and  your advice.
>> >
>>
>> I think the parallels that you identify hold.
>>
>> I think for me the biggest issue with BibJSON right now is a lack of clear use cases.
>
> I dont think so. The personal and departmental use cases are extremely obvious and easily documented
> and already driving development. So is the small publisher/aggregator use case. There are 20 other use cases,
> but these are diffuse and relatively unimportant at present.

As above, these use cases are achievable without a bibjson spec.

>> With use cases come stakeholders and with stakeholders come the knowledge to actually extract the requirements that BibJSON needs to
>> provide.  At the moment I feel that we're just working on "another serialisation" of some metadata, but we must have carefully justified to
>> ourselves and to everyone else why this new serialisation needs to exist.
>
> On the contrary, we are working on the data model that is needed to drive BibServer application, and to facilitate lightweight biblio data exchange.
> This is operationally defined by capabilities and needs of the existing BibServer installations. There was no biblio format in existence which met
> these requirements.  So we designed BibJSON  for this purpose. It works, and is worth consolidating and extending to comprehensively serve the
> use cases for which it was designed.

Bibserver does not actually need a data model, that is the value of a
relationless database. Parsers may parse to a particular
specification, but this is not actually a requirement to use
bibserver.

>> Some examples from our previous discussions of BibJSON, that would need to be explored more deeply are:
>
>> 1/ Building remote UIs over bibliographic datasources
>> 2/ Bibliographic system to bibliographic system data transfer
>> 3/ Embedding metadata in scholarly documents
>
> Yes. These are all additional use cases. But I think they are all easily accomodated without reference to or afecting the BibJSON format standard
> very much.
> For 1/ you provide RESTful web services that can import/export JSON. Nothing new there.
> For 2/ you need to push on Krichel's resynch techniques, or OAI-PMH.  Again, general file sharing stuff, nothing specific to BibJSON
> For 3/ you just need to provide COINs or schema.org or flavor of the month embedding. It doesnt matter too much, and this will be a chaotic space for
> a while. Again, it does not effect the structure of BibJSON.
>
>> At the moment with BibJSON we're rushing ahead to the technical
>> specification and implementation whithout yet having a clearly defined purpose,
>
> On the contrary, I have a very clearly defined purpose, to consolidate the personal and departmental and small publisher use cases.

>> and that will inevitably lead to arguments, lack of agreement, and overall instability.
>
> It would if we tried to deal with the 20 use cases. But I dont plan to do that. I am doing the best I can to keep Mark sharply
> focussed on the personal and departmental use cases, and we can proceed from there. If BibJSON serves only those cases, it will be worth
> maintaining. But the potential to serve other cases is rather obvious, and these will develop over time.
>
>> In SWORD, for example, I didn't write a line of code for about 6 months, and then when I did that was just to help me explore
>> the early version of the specification.  The actual coding phase of the project didn't start until 4 months later.
>
> Keep in mind that BibJSON is not new. BibServer has been under dev since 2003, and BibJSON emerged in response to BibServer needs
> in 2009.

What were those needs? There were a lot of people listed previously as
BibKN members who are no longer around. The previous requirement for
bibjson appears to have been to represent bibliographic metadata in
RDF for distribution between nodes in a network. Such architecture is
not a requirement for individuals and small groups.

>> As a first action, I'd recommend starting (maybe already done) a document which lists the use cases for BibJSON and goes into them in some detail.
>
> OK. But this takes effort, and who is going to lead that?  For Mark and myself, the needs are fairly obvious, and largely accomodated by
> now.  It is some further process of documentation of what we are doing and why that is needed. Neither of us has the time to do this as
> extensively as it should be, so it gets left undone. We have starts in the old BibJSON spec, and ether pads. But someone has to pull this
> together.

We have specs listed now on the
http://bibserver.okfn.org/bibjson/specs. If there are further
community needs for clearer specification, then we need to hear them
from the community.