[okfn-labs] Expression of Interest: Seeking developer for extractives project data repository
Anders Pedersen
anderspeders at gmail.com
Fri Feb 5 14:43:24 UTC 2016
Hi everyone,
At Natural Resource Governance Institute
<http://www.resourcegovernance.org/> we are currently looking for a
developer to support the development of a repository of extractives
projects. We are preparing for alpha launch over the coming months and thus
the work will be quite time intensive. We are aware that the Terms of
Reference covers quite a wide area of skills so I'd want to emphasize that
want to hear from you even if you are able to cover just some of them.
Please see the Terms of Reference below here and pasted below:
https://docs.google.com/document/d/19Dzt2ksjgj-eQNsTdrq_gbr-h82O17XZgExu92trcWw/edit
Please let get in touch with us if you have any questions. We look forward
to hear from you!
Best,
Anders
----
Timeline:
-
Delivery by March 31
Days available:
-
20-30 days
Technical Stack:
We are using a full javascript technology stack consisting of MongoDB,
Express, Angular and Node. The front end site uses Jade for templating and
stylus for css. All non-data components run as a docker container. The
current code base can be found at https://github.com/NRGI/rp-org-frontend.
Some documentation here
<https://docs.google.com/document/d/1ZjYPU3c5RaBFMc9K5NQZw4c7qfHPQkOV3U9lPIztt_U/edit?ts=56b387d6#heading=h.bcatv44j5upj>.
The datamodel links 4 main entities: contracts (via offsite strong linkage
to ResourceContracts API
<https://github.com/NRGI/resourcecontracts.org/wiki/API>), companies (via
loose linkage to OpenCorporates API <https://api.opencorporates.com>),
concessions (via loose linkage to the Open Oil concession API
<http://openoil.net/openoil-api/>), and projects. A fifth entity, “source”,
is used to catalog sourcing for linkages and details about the main
entities. Companies, concessions and projects are populated by “facts”
which are sourced versions of truths about each entity. For example,
multiple sources may have conflicting information about where a company is
incorporated. These “facts” will allow for maintenance of the various
truths that can be filtered by source. Each fact subdocument contains a
source reference and a datapoint payload. All of the 4 main entities are
also connected via a “link” mongo collection. Each link consists of an
entity array (i.e. the entities that are being linked), a source reference,
and a reference for each of the linked entities. In addition to these main
models there are a number helper collections (commodities, countries,
entity aliases, etc.).
Deliverables will be selected with the consultant from this list:
-
Development of Express/Mongoose API and Mongoose model methods: The
current codebase contains fleshed out models. We need a number of model
methods to deal with certain external collections. For example, until the
open oil API is more stable, we need to maintain a parallel database of
concessions. A possible model method will involve an external api call to
pull new concession data from the Open Oil API to update our own db. In
addition, the current API implementation handles user authentication and
some basic user model data pulls. For each of the entities we will need API
PUT, POST, GET, and DELETE methods. POST, PUT and DELETE methods will need
to be protected methods and GET methods will need to make use of mongo’s
populate functionality to deal with linked references.
-
Angular controllers: There are currently a set of jade templates
migrated from the alpha version of the site. We need a set of angular
controllers and routing for these templates. The preference is for liberal
use of directives in things that appear in multiple places.
-
Caching: There is the potential for serious performance bottlenecks both
server and client side. We need to implement a caching mechanism to handle
both the main API connection as well as external API calls.
-
Deduplication process and UI: There is an issue with multiple entities
coming into the system, which are determined at a later date to in fact be
variants of a single entity. We need a set of methods, UI components, and
workflow for deduplicating these. The process will merge two entities into
the entity determined as the main and bulk update the “link” records.
-
Flexible general ETL process: The alpha site consists of a Django
microservice that pulls from Google Sheets templates via plugin, transforms
into RDF and loads into a virtuoso database. This needs to be modified to
work with Mongo. An ideal system will take in validated data via Excel or
Sheets, transform to appropriate Document structure for insertion, and
archive via API in an institutional CKAN instance. This can be a
modification of the existing Django app or live within the main MEAN app.
-
Customized ETL process for UK Companies House disclosures: In March, UK
Companies house will be dumping company disclosures of payments to
governments. We need a customized ETL connection to this API that will
recieve both the original dump as well as update periodically via cronjob.
-
Suggestions for project verification workflows. For example: An
extension, where an algorithm defines a category of “verified” projects
based on the number and type of source confirming its existence.
An Expression of Interest should include:
-
A CV or github repo
-
Two references
-
A requested day rate
Contact:
Responses will be reviewed on a rolling basis. Please reply no later than
15th February to David Mihalyi, Economic Analyst at NRGI:
dmihalyi at resourcegovernance.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20160205/c729f0c7/attachment-0003.html>
More information about the okfn-labs
mailing list