[okfn-labs] The future of Nomenklatura

Rufus Pollock rufus.pollock at okfn.org
Wed Nov 13 08:23:02 UTC 2013


On 9 November 2013 14:04, Friedrich Lindenberg <friedrich at pudo.org> wrote:

> Hi all,
>
> while I've been using nomenklatura successfully in a variety of services
> for the past couple of months, it hasn't really spread and found more
> users. At the same time, I'm beginning to meet it's limitations with larger
> datasets.
>
>
> Problems with nomenklatura
> --------------------------
>
> Some of the problems that people have reported have been about
> understanding what  the service does in the first place, as well as the
> quality of the current implementation (e.g. the upload has been partially
> broken and its UI cryptic).
>

I do think the UI / tutorial is possibly the bigger blocker - not the
actual functionality (but I say that as someone who has only used it to a
limited extent). I've certainly struggled when pointing it out to others to
find the obvious "getting started" manual.

I wonder if it would be worth, before doing much new coding, to write out
what a perfect tutorial would look like (focused on nomenklatura to start
with but perhaps then adding comments about places you'd want to modify).
You could perhaps think of 2 tutorials one for a coder setting up and the
other for a less technical refine user.

Beyond that, there are several limitations to nomenklatura. One is the lack
> of a clustering mechanism. The tool only compares entity labels one-to-one,
> rather than trying to create larger groups - like, for example, Refine does
> in its "Cluster & Edit" mode. This makes it harder to crunch large datasets
> effectively.
>
> At the same time, nomenklatura's notion of datasets prevents the service
> from helping users to discover links across datasets - e.g. a list of all
> EU lobbyists might overlap with those companies competing for EU tenders.
>

My sense is that the link problem may be something different (though
important) - and being another big chunk might want to be kept separate to
start with.

This also makes me wonder: do you have a list of your key user stories -
that might help clarify what things going into the minimal viable
enhancement and which don't.


>
> Proposed approach
> -----------------
>
> To tackle these issues and to make nomenklatura more attractive for new
> users, I'm considering a fairly radical re-framing of the service. This
> would include the following changes:
>

As above I think writing a proper tutorial for nomenklatura as it stands
today would be a really valuable use of time before you get into coding but
I know coding is more fun ;-)


> * Limit the semantics of the services to only recognize social entities,
> ie. people, companies, public bodies and similar items. This should help
> clarify the use case and make the service easier to understand.
> * Create a global ID space and generate one URI per entity, independent of
> its source dataset.
> * Replace datasets with "contexts", where one entity can be part of
> multiple contexts.
> * Build out a clustering mode inspired by Refine that can work either
> within a context or globally.
> * Use Popolo-inspired microformats to store further attributes for each
> entity.
>

May be useful to articulate the user stories behind each of these (a bit).
This seems quite "meaty" and it may be useful to prioritize in some way.

Technically, this would be accomplished by:
>
> * Switching to MongoDB for storage
> * Re-building the UI in AngularJS
>
> The advantages of this approach would be:
>
> * Creates links between datasets, aiming towards a flexible, re-usable
> entity namespace.
> * Provide a richer set of entities to cluster with, thus hopefully better
> data integration.
> * Could more easily serve as a backend to publicbodies.org
>

> I'm keen to hear what people thing about this kind of plan, and if anyone
> wants to contribute to such an effort - or knows about existing efforts
> that this could pair up with!
>

Very excited to see these new developments and will aim to contribute where
I can :-)

Rufus


>
> Cheers,
>
> - Friedrich
>
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs
>
>


-- 


*Rufus PollockFounder and Executive Director | skype: rufuspollock |
@rufuspollock <https://twitter.com/rufuspollock>The Open Knowledge
Foundation <http://okfn.org/>Empowering through Open
Knowledgehttp://okfn.org/ <http://okfn.org/> | @okfn
<http://twitter.com/OKFN> | OKF on Facebook
<https://www.facebook.com/OKFNetwork> |  Blog <http://blog.okfn.org/>  |
 Newsletter <http://okfn.org/about/newsletter>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20131113/2cda8890/attachment-0003.html>


More information about the okfn-labs mailing list