[okfn-labs] Expression of Interest: Seeking developer for extractives project data repository

Anders Pedersen anderspeders at gmail.com
Fri Feb 5 14:43:24 UTC 2016

Hi everyone,

At Natural Resource Governance Institute
<http://www.resourcegovernance.org/> we are currently looking for a
developer to support the development of a repository of extractives
projects. We are preparing for alpha launch over the coming months and thus
the work will be quite time intensive. We are aware that the Terms of
Reference covers quite a wide area of skills so I'd want to emphasize that
want to hear from you even if you are able to cover just some of them.

Please see the Terms of Reference below here and pasted below:

Please let get in touch with us if you have any questions. We look forward
to hear from you!





   Delivery by March 31

Days available:


   20-30 days

Technical Stack:

We are using a full javascript technology stack consisting of MongoDB,
Express, Angular and Node. The front end site uses Jade for templating and
stylus for css. All non-data components run as a docker container. The
current code base can be found at https://github.com/NRGI/rp-org-frontend.
Some documentation here

The datamodel links 4 main entities: contracts (via offsite strong linkage
to ResourceContracts API
<https://github.com/NRGI/resourcecontracts.org/wiki/API>), companies (via
loose linkage to OpenCorporates API <https://api.opencorporates.com>),
concessions (via loose linkage to the Open Oil concession API
<http://openoil.net/openoil-api/>), and projects. A fifth entity, “source”,
is used to catalog sourcing for linkages and details about the main
entities. Companies, concessions and projects are populated by “facts”
which are sourced versions of truths about each entity. For example,
multiple sources may have conflicting information about where a company is
incorporated. These “facts” will allow for maintenance of the various
truths that can be filtered by source. Each fact subdocument contains a
source reference and a datapoint payload. All of the 4 main entities are
also connected via a “link” mongo collection. Each link consists of an
entity array (i.e. the entities that are being linked), a source reference,
and a reference for each of the linked entities. In addition to these main
models there are a number helper collections (commodities, countries,
entity aliases, etc.).

Deliverables will be selected with the consultant from this list:


   Development of Express/Mongoose API and Mongoose model methods: The
   current codebase contains fleshed out models. We need a number of model
   methods to deal with certain external collections. For example, until the
   open oil API is more stable, we need to maintain a parallel database of
   concessions. A possible model method will involve an external api call to
   pull new concession data from the Open Oil API to update our own db. In
   addition, the current API implementation handles user authentication and
   some basic user model data pulls. For each of the entities we will need API
   PUT, POST, GET, and DELETE methods. POST, PUT and DELETE methods will need
   to be protected methods and GET methods will need to make use of mongo’s
   populate functionality to deal with linked references.

   Angular controllers: There are currently a set of jade templates
   migrated from the alpha version of the site. We need a set of angular
   controllers and routing for these templates. The preference is for liberal
   use of directives in things that appear in multiple places.

   Caching: There is the potential for serious performance bottlenecks both
   server and client side. We need to implement a caching mechanism to handle
   both the main API connection as well as external API calls.

   Deduplication process and UI: There is an issue with multiple entities
   coming into the system, which are determined at a later date to in fact be
   variants of a single entity.  We need a set of methods, UI components, and
   workflow for deduplicating these. The process will merge two entities into
   the entity determined as the main and bulk update the “link” records.

   Flexible general ETL process: The alpha site consists of a Django
   microservice that pulls from Google Sheets templates via plugin, transforms
   into RDF and loads into a virtuoso database. This needs to be modified to
   work with Mongo. An ideal system will take in validated data via Excel or
   Sheets, transform to appropriate Document structure for insertion, and
   archive via API in an institutional CKAN instance. This can be a
   modification of the existing Django app or live within the main MEAN app.

   Customized ETL process for UK Companies House disclosures: In March, UK
   Companies house will be dumping company disclosures of payments to
   governments. We need a customized ETL connection to this API that will
   recieve both the original dump as well as update periodically via cronjob.

   Suggestions for project verification workflows. For example: An
   extension, where an algorithm defines a category of “verified” projects
   based on the number and type of source confirming its existence.

An Expression of Interest should include:


   A CV or github repo

   Two references

   A requested day rate

Responses will be reviewed on a rolling basis. Please reply no later than
15th February to David Mihalyi, Economic Analyst at NRGI:
dmihalyi at resourcegovernance.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20160205/c729f0c7/attachment-0003.html>

More information about the okfn-labs mailing list