[wdmmg-dev] Mark's AidData

Mark Brough mark.brough at publishwhatyoufund.org
Mon Jun 20 10:26:14 UTC 2011


Hi Friedrich

Great. Actually, this is good timing, as I got confused about what I was
trying to do with normalising the countries and organisations and ended up
with an almost infinite process. So I'm going to try and re-work most of the
import controller. But I'm also starting to feel like me building a
complicated relational database is maybe not really worth it to just show
some nice pictures...

Wednesday sounds good. I'm going to Budapest on Wednesday evening but most
of the day would be fine.

*CSV Mapping*
I don't have the CSV mapping, I just parse the XML directly into the
database in a massive function, you can see it here:
https://github.com/markbrough/IATI-Data/blob/master/app/controllers/iatiregistry_controller.rb

I didn't try to import the IATI data into OpenSpending because I couldn't
get existing packages to work, so I figured creating my own would be even
less likely! But I'll have a think about:
a) What else would need to be added to or changed in iati2csv (and your
mapping) to make it complete and hopefully work for DFID/WB/any future IATI
data
b) If there's any information in an activity which is not shared by all the
transactions - I think there might be but not sure. And also, whether this
matters.

*Import CSV/XML*
I take the point about the maintenance nightmare - although at the same
time, it would be nice for there to be some way to update reasonably easily
from the IATI Registry as:
a) new donors publish (should be another 7 or so by November)
b) existing donors update their data (DFID last updated about 2 weeks ago -
I think they do so every month).
c) I'm thinking about building an example CSV to IATI converter - where you
upload your aid data (e.g. Estonia<https://rakendused.vm.ee/akta/andmed.php>
/Norway <http://www.norad.no/norskbistanditall/>/PEPFAR<http://www.cgdev.org/content/publications/detail/1422023/>)
and map it to IATI fields and it gives it back to you in IATI XML. Is that a
good idea?

On the other hand, I guess more manual processing does sound like it could
be better for tidying up data before import. And there are some cool
possibilities, like pulling in geo-coded data for each WB project (which
isn't in their IATI data but it is normally in the Mapping for Results data)
via this: http://api.worldbank.org/api/projects -- example:
http://search.worldbank.org/api/projects?qterm=*:*&fl=id,location&countrycode[]=IN&format=json

*My errors with OS*
Re my installation of OpenSpending (on Ubuntu 11.04), looking at the Uganda
dataset, this works fine:
http://127.0.0.1:5000/dataset/uganda/dimension/from
http://127.0.0.1:5000/dataset/uganda/dimension/to

This gives error 500 (attached error from the paster and solr consoles):
http://127.0.0.1:5000/dataset/uganda

(I ran *paster load uganda* (with some new-ish but not the final data) and
got no errors, these are the last few lines:
2011-06-20 10:32:46,910 INFO  [wdmmg.lib.loader] uganda loaded 11000 in
0.89s
2011-06-20 10:32:47,181 INFO  [wdmmg.lib.cubes] compute cube for dataset
'uganda', cube name: 'default', dimensions: 'to, from, swg,
sector_objective, year'
2011-06-20 10:32:49,786 INFO  [wdmmg.lib.cubes] Done. Took: 2s

I tried *pater load cra* with the CRA dataset as well and it looked like
everything was going OK until I got this error:
IOError: [Errno 2] No such file or directory:
'/home/pwyf/env/wdmmg/pylons_data/getdata/ukgov-finances-cra/nuts1_population_2006.csv'
)

Maybe I need to update? (Should I delete /wdmmg and then hg clone it back
again? Presume I need to clear the database - I don't know mongodb
either...)

cheers
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20110620/cd25c277/attachment.html>
-------------- next part --------------
2011-06-20 10:21:40,574 INFO  [wdmmg.lib.browser] {'sort': 'score desc, amount desc', 'fq': ['+is_aggregate:false', u'+dataset.name:cra', '+is_aggregate:false'], 'rows': 0, 'stats': 'true', 'q': '*:*', 'start': 0, 'stats_field': 'amount'}
2011-06-20 10:21:40,576 WARNI [wdmmg.lib.base] Request to /dataset/cra took 5ms
Error - <class 'socket.error'>: [Errno 111] Connection refused
URL: http://127.0.0.1:5000/dataset/cra
File '/home/pwyf/env/lib/python2.7/site-packages/WebError-0.10.2-py2.7.egg/weberror/errormiddleware.py', line 162 in __call__
  app_iter = self.application(environ, sr_checker)
File '/home/pwyf/env/lib/python2.7/site-packages/repoze.who-2.0a4-py2.7.egg/repoze/who/middleware.py', line 87 in __call__
  app_iter = app(environ, wrapper.wrap_start_response)
File '/home/pwyf/env/lib/python2.7/site-packages/beaker/middleware.py', line 73 in __call__
  return self.app(environ, start_response)
File '/home/pwyf/env/lib/python2.7/site-packages/beaker/middleware.py', line 152 in __call__
  return self.wrap_app(environ, session_start_response)
File '/home/pwyf/env/lib/python2.7/site-packages/Routes-1.12.3-py2.7.egg/routes/middleware.py', line 131 in __call__
  response = self.app(environ, start_response)
File '/home/pwyf/env/lib/python2.7/site-packages/Pylons-1.0-py2.7.egg/pylons/wsgiapp.py', line 107 in __call__
  response = self.dispatch(controller, environ, start_response)
File '/home/pwyf/env/lib/python2.7/site-packages/Pylons-1.0-py2.7.egg/pylons/wsgiapp.py', line 312 in dispatch
  return controller(environ, start_response)
File '/home/pwyf/env/wdmmg/wdmmg/lib/base.py', line 65 in __call__
  return WSGIController.__call__(self, environ, start_response)
File '/home/pwyf/env/lib/python2.7/site-packages/Pylons-1.0-py2.7.egg/pylons/controllers/core.py', line 211 in __call__
  response = self._dispatch_call()
File '/home/pwyf/env/lib/python2.7/site-packages/Pylons-1.0-py2.7.egg/pylons/controllers/core.py', line 162 in _dispatch_call
  response = self._inspect_call(func)
File '/home/pwyf/env/lib/python2.7/site-packages/Pylons-1.0-py2.7.egg/pylons/controllers/core.py', line 105 in _inspect_call
  result = self._perform_call(func, args)
File '/home/pwyf/env/lib/python2.7/site-packages/Pylons-1.0-py2.7.egg/pylons/controllers/core.py', line 57 in _perform_call
  return func(**args)
File '/home/pwyf/env/wdmmg/wdmmg/lib/restapi.py', line 27 in view
  return self._view(id=id, format=format)
File '/home/pwyf/env/wdmmg/wdmmg/lib/restapi.py', line 32 in _view
  return handler(id, format)
File '/home/pwyf/env/wdmmg/wdmmg/lib/restapi.py', line 47 in _handle_get
  return handler(result)
File '/home/pwyf/env/wdmmg/wdmmg/controllers/dataset.py', line 49 in _view_html
  c.num_entries = logic.entry.count(**entry_query)
File '/home/pwyf/env/wdmmg/wdmmg/logic/entry.py', line 147 in count
  return browser.num_results
File '/home/pwyf/env/wdmmg/wdmmg/lib/browser.py', line 123 in num_results
  return self.results.get('response', {}).get('numFound')
File '/home/pwyf/env/wdmmg/wdmmg/lib/browser.py', line 114 in results
  self._results = self.query()
File '/home/pwyf/env/wdmmg/wdmmg/lib/browser.py', line 178 in query
  return self._query(**kw)
File '/home/pwyf/env/wdmmg/wdmmg/lib/browser.py', line 156 in _query
  response = app_globals.solr.raw_query(**kwargs)
File '/home/pwyf/env/lib/python2.7/site-packages/solrpy-0.9.4-py2.7.egg/solr/core.py', line 706 in raw_query
  return self.select.raw(**params)
File '/home/pwyf/env/lib/python2.7/site-packages/solrpy-0.9.4-py2.7.egg/solr/core.py', line 822 in raw
  rsp = conn._post(self.selector, request, conn.form_headers)
File '/home/pwyf/env/lib/python2.7/site-packages/solrpy-0.9.4-py2.7.egg/solr/core.py', line 646 in _post
  self._reconnect()
File '/home/pwyf/env/lib/python2.7/site-packages/solrpy-0.9.4-py2.7.egg/solr/core.py', line 625 in _reconnect
  self.conn.connect()
File '/usr/lib/python2.7/httplib.py', line 754 in connect
  self.timeout, self.source_address)
File '/usr/lib/python2.7/socket.py', line 571 in create_connection
  raise err
error: [Errno 111] Connection refused


CGI Variables
-------------
  CONTENT_LENGTH: '0'
  HTTP_ACCEPT: 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
  HTTP_ACCEPT_CHARSET: 'ISO-8859-1,utf-8;q=0.7,*;q=0.7'
  HTTP_ACCEPT_ENCODING: 'gzip, deflate'
  HTTP_ACCEPT_LANGUAGE: 'en-gb,en;q=0.5'
  HTTP_CONNECTION: 'keep-alive'
  HTTP_COOKIE: '__utma=96992031.1495204519.1305849247.1306229577.1308560911.9; __utmz=96992031.1305849247.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); wdmmg=c8fd08843591f1e35b39d3b85b8f7189f9e6daf70a054ef7d366aa39312a08daf813d79f; __utmb=96992031.16.10.1308560911; __utmc=96992031'
  HTTP_HOST: '127.0.0.1:5000'
  HTTP_KEEP_ALIVE: '115'
  HTTP_USER_AGENT: 'Mozilla/5.0 (X11; Linux i686; rv:2.0.1) Gecko/20100101 Firefox/4.0.1'
  PATH_INFO: '/dataset/cra'
  REMOTE_ADDR: '127.0.0.1'
  REQUEST_METHOD: 'GET'
  SERVER_NAME: '0.0.0.0'
  SERVER_PORT: '5000'
  SERVER_PROTOCOL: 'HTTP/1.1'


WSGI Variables
--------------
  application: <repoze.who.middleware.PluggableAuthenticationMiddleware object at 0xa85a1cc>
  beaker.cache: <beaker.cache.CacheManager object at 0xa85a0cc>
  beaker.get_session: <bound method SessionMiddleware._get_session of <beaker.middleware.SessionMiddleware object at 0xa851f8c>>
  beaker.session: {'_accessed_time': 1308561700.57183, '_creation_time': 1308560910.063244}
  paste.cookies: (<SimpleCookie: __utma='96992031.1495204519.1305849247.1306229577.1308560911.9' __utmb='96992031.16.10.1308560911' __utmc='96992031' __utmz='96992031.1305849247.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)' wdmmg='c8fd08843591f1e35b39d3b85b8f7189f9e6daf70a054ef7d366aa39312a08daf813d79f'>, '__utma=96992031.1495204519.1305849247.1306229577.1308560911.9; __utmz=96992031.1305849247.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); wdmmg=c8fd08843591f1e35b39d3b85b8f7189f9e6daf70a054ef7d366aa39312a08daf813d79f; __utmb=96992031.16.10.1308560911; __utmc=96992031')
  paste.httpserver.thread_pool: <paste.httpserver.ThreadPool object at 0x9fafb8c>
  paste.registry: <paste.registry.Registry object at 0xaaf0aec>
  paste.throw_errors: True
  pylons.action_method: <bound method DatasetController.view of <wdmmg.controllers.dataset.DatasetController object at 0xab260ac>>
  pylons.controller: <wdmmg.controllers.dataset.DatasetController object at 0xab260ac>
  pylons.environ_config: {'session': 'beaker.session', 'cache': 'beaker.cache'}
  pylons.pylons: <pylons.util.PylonsContext object at 0xab26e2c>
  pylons.routes_dict: {'action': u'view', 'controller': u'dataset', 'id': u'cra'}
  repoze.who.api: <repoze.who.api.API object at 0xaafb84c>
  repoze.who.logger: <logging.Logger object at 0xa85a24c>
  repoze.who.plugins: {'username': <wdmmg.lib.authenticator.UsernamePasswordAuthenticator object at 0xa85a18c>, 'apikey': <wdmmg.lib.authenticator.ApiKeyAuthenticator object at 0xa85a16c>, 'form': <FriendlyFormPlugin 176529580>, 'basicauth': <BasicAuthPlugin 176529740>, 'auth_tkt': <AuthTktCookiePlugin 176529644>}
  routes.route: <routes.route.Route object at 0xa7edfcc>
  routes.url: <routes.util.URLGenerator object at 0xab26f0c>
  webob._parsed_query_vars: (GET([]), '')
  webob.adhoc_attrs: {'language': 'en-us'}
  wsgi process: 'Multithreaded'
  wsgiorg.routing_args: (<routes.util.URLGenerator object at 0xab26f0c>, {'action': u'view', 'controller': u'dataset', 'id': u'cra'})
------------------------------------------------------------

            <p>Additionally an error occurred while sending the <weberror.reporter.EmailReporter object at 0xab2c68c> report:

            <pre>Traceback (most recent call last):
  File "/home/pwyf/env/lib/python2.7/site-packages/WebError-0.10.2-py2.7.egg/weberror/errormiddleware.py", line 450, in send_report
    rep.report(exc_data)
  File "/home/pwyf/env/lib/python2.7/site-packages/WebError-0.10.2-py2.7.egg/weberror/reporter.py", line 54, in report
    self.to_addresses, msg.as_string())
  File "/usr/lib/python2.7/smtplib.py", line 712, in sendmail
    raise SMTPRecipientsRefused(senderrs)
SMTPRecipientsRefused: {'you at yourdomain.com': (550, '5.1.1 <you at yourdomain.com>: Recipient address rejected: yourdomain.com')}
</pre>
            </p>2011-06-20 10:21:40,756 WARNI [wdmmg.lib.base] Request to /error/document took 17ms


-------------- next part --------------
20-Jun-2011 10:25:19 org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: sort param field can't be found: amount
	at org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:378)
	at org.apache.solr.search.QParser.getSort(QParser.java:222)
	at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:85)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:326)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
	at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

20-Jun-2011 10:25:19 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select params={sort=score+desc,+amount+desc&start=0&stats=true&q=*:*&stats.field=amount&wt=json&fq=%2Bis_aggregate:false&fq=%2Bdataset.name:cra&fq=%2Bis_aggregate:false&rows=0} status=400 QTime=3 



More information about the openspending-dev mailing list