[ckan-dev] Offener Haushalt data audit example and mongoaudit tool
Stefan Urbanek
stefan at knowerce.com
Wed Dec 15 18:10:38 UTC 2010
Hi,
For better understanding of wdmmg data, I've created MongoDB auditing tool. The tool can produce output like this:
pdf_link:
storage type: unicode
present values: 22 (95.65%)
null: 0 (0.00% of records, 0.00% of values)
empty strings: 0
...
flow:
storage type: unicode
present values: 19248 (100.00%)
null: 0 (0.00% of records, 0.00% of values)
empty strings: 0
distinct values:
'spending'
'income'
More examples:
http://democracyfarm.org/f/wdmmg/mongoaudit/
If anyone would like to try it on other datasets (such as wdmmg uk dataset), here is the source:
https://github.com/Stiivi/brewery-py
just install it with: python setup.py and you will get the tool:
usage: mongoaudit [-h] [-H HOST] [-p PORT] [-t THRESHOLD] [-f {text,json}]
database collection
Audit a MongoDB collection
positional arguments:
database
collection
optional arguments:
-h, --help show this help message and exit
-H HOST, --host HOST host
-p PORT, --port PORT port
-t THRESHOLD, --threshold THRESHOLD
threshold for number of distinct values (default is
10)
-f {text,json}, --format {text,json}
output format (default is text)
Example:
mongoaudit wdmmg entities
mongoaudit --format json wdmmg entries
If you need any help, or if you have any suggestions or question let me know.
Stefan
freelance consultant, analyst
knowerce
http://www.knowerce.sk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20101215/b10c8d9f/attachment.html>
More information about the ckan-dev
mailing list