[okfn-labs] SEC EDGAR database

Rufus Pollock rufus.pollock at okfn.org
Tue Mar 4 17:21:59 UTC 2014


Hi All,

In spare time recently I've been taking a look at extracting information
from the SEC's EDGAR database. I've just written up some initial learnings
here:

http://okfnlabs.org/blog/2014/03/04/sec-edgar-database.html

There's also a longstanding dataset on the DataHub from which some of this
info is drawn:

http://datahub.io/dataset/edgar

And also a github repo data package (in progress):

https://github.com/datasets/edgar

Next up for me is scripting to automate extraction of info esp from XBRL
and wondered if anyone else has played around. For XBRL from EDGAR I've
found a few (python) libs so far including:

* https://github.com/andrewkittredge/financial_fundamentals - seems quite
clean usable
* https://github.com/lukerosiak/pysec - also quite good and from someone
working at Sunlight Labs but I've struggled to get it working on
* http://arelle.org/ - powerful - much more than parsing and looks well
maintained but seems quite complex

If anyone else has thoughts here or knows of good libs to use (in any
language) I'd love to hear about it (I'm also interested in historical data
- I know CorpWatch did some great work extracting info from text files here
http://api.corpwatch.org/)

Rufus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20140304/bb65cba8/attachment-0003.html>


More information about the okfn-labs mailing list