[ckan-dev] Offener Haushalt data audit example and mongoaudit tool

Stefan Urbanek stefan at knowerce.com
Wed Dec 15 18:10:38 UTC 2010


Hi,

For better understanding of wdmmg data, I've created MongoDB auditing tool. The tool can produce output like this:

pdf_link:
	storage type: unicode
	present values: 22 (95.65%)
	null: 0 (0.00% of records, 0.00% of values)
	empty strings: 0
...
flow:
	storage type: unicode
	present values: 19248 (100.00%)
	null: 0 (0.00% of records, 0.00% of values)
	empty strings: 0
	distinct values:
		'spending'
		'income'

More examples:

	http://democracyfarm.org/f/wdmmg/mongoaudit/

If anyone would like to try it on other datasets (such as wdmmg uk dataset), here is the source:

	https://github.com/Stiivi/brewery-py

just install it with: python setup.py and you will get the tool:

usage: mongoaudit [-h] [-H HOST] [-p PORT] [-t THRESHOLD] [-f {text,json}]
                  database collection

Audit a MongoDB collection

positional arguments:
  database
  collection

optional arguments:
  -h, --help            show this help message and exit
  -H HOST, --host HOST  host
  -p PORT, --port PORT  port
  -t THRESHOLD, --threshold THRESHOLD
                        threshold for number of distinct values (default is
                        10)
  -f {text,json}, --format {text,json}
                        output format (default is text)

Example:
	mongoaudit wdmmg entities
	mongoaudit --format json wdmmg entries

If you need any help, or if you have any suggestions or question let me know.

Stefan

freelance consultant, analyst

knowerce
http://www.knowerce.sk



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20101215/b10c8d9f/attachment.html>


More information about the ckan-dev mailing list