[okfn-labs] Public domain US legal data and code

Eric Mill eric at sunlightfoundation.com
Fri Oct 5 14:57:44 UTC 2012


Hi all,

I've been working for the last month or two with Josh Tauberer (of
GovTrack.us <http://govtrack.us>) and Derek Willis on a project to produce
a public domain scraper and dataset from THOMAS.gov <http://thomas.gov>,
the official source for legislative information for the US Congress.

It's a reasonably well documented set of Python scripts, which you can find
here:
https://github.com/unitedstates/congress

We just hit a great milestone - it gets everything important that THOMAS
has on bills, back to the year THOMAS starts (1973). We've published and
documented <https://github.com/unitedstates/congress/wiki> all of this data
in bulk, and I've worked it into Sunlight's pipeline, so that searches for
bills in Scout<https://scout.sunlightfoundation.com/search/federal_bills/freedom%20of%20information>use
data collected directly from this effort.

The data and code are all hosted on Github on a
"unitedstates<https://github.com/unitedstates/>"
organization, which is right now co-owned by me, Josh, and Derek - the
intent is to have this all exist in a common space. To the extent that the
code needs a license at all, I'm using a public domain
"unlicense<https://github.com/unitedstates/congress/blob/master/LICENSE>"
that should at least be sufficient for the US (other suggestions welcome).

There's other great stuff in this organization, too - Josh made an amazing
donation of his legislator
dataset<https://github.com/unitedstates/congress-legislators>,
and converted it to YAML for easy reuse. I've worked that dataset into
Sunlight's products already as well. I've also moved my legal citation
extractor <https://github.com/unitedstates/citation> into this organization
-- and my colleague Thom Neale has an in-progress parser for the US
Code<https://github.com/unitedstates/uscode>,
to convert it from binary typesetting codes into JSON.

Github's organization structure actually makes possible a very neat
commons. I'm hoping this model proves useful, both for us and for the
public.

-- Eric

-- 
Developer | sunlightfoundation.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20121005/d54c5555/attachment-0001.html>


More information about the okfn-labs mailing list