[okfn-labs] Labs newsletter: 14 November, 2013

Neil Ashton neil.ashton at okfn.org
Thu Nov 14 15:03:52 UTC 2013


Hi all,

Labs was bristling with discussion and creation this week, with
improvements to two projects, interesting conversations around a few
others, and an awesome new post on the blog.

*## Data Pipes: lots of improvements*

Data Pipes <http://datapipes.okfnlabs.org/> is a Labs project that provides
a web API for a set of simple data-transforming operations that can be
chained together in the style of Unix pipes.

This past week, Andy Lulham <https://github.com/andylolz> has made a
*huge*number of improvements to Data Pipes. Just a few of the new
features and
fixes:

   - new operations: strip (removes empty rows), tail (truncate dataset to
   its last rows)
   - new features: a range function and a "complement" switch for cut;
   options for grep
   - all operations in pipeline are now trimmed for whitespace
   - basic tests have been added

Have a look at the closed
issues<https://github.com/okfn/datapipes/issues?page=1&state=closed>to
see more of what Andy has been up to.

*## Webshot: new homepage and feature*

Last week we introduced you to Webshot <http://webshot.okfnlabs.org/>, a
web API for screenshots of web pages.

Back then, Webshot's home page was just a screenshot of GitHub. Now Webshot
has a proper home page <http://webshot.okfnlabs.org/> with a form interface
to the API.

Webshot has also added support for *full page* screenshots. Now you can
capture the whole page rather than just its visible portion.

*## On the blog: natural language processing with Python*

Labs member Tarek Amr <http://tarekamr.appspot.com/> has contributed an
awesome post on Python natural language
processing<http://okfnlabs.org/blog/2013/11/11/python-nlp.html>with
the NLTK toolkit to the Labs blog.

"The beauty of NLP," Tarek says, "is that it enables computers to extract
knowledge from unstructured data inside textual documents." Read his post
to learn how to do text normalization, frequency analysis, and text
classification with Python.

*## Data Packages workflow à la Node*

Wouldn't it be nice to be able to initialize new Data
Packages<http://data.okfn.org/standards/data-package>as easily as you
can initialize a Node module with npm?

Max Ogden <http://www.gittip.com/maxogden/> started a discussion
thread<https://github.com/okfn/datapackage.js/issues/3>around this
enticing idea, eventually leading to Rufus
Pollock <http://rufuspollock.org> booting a new repo for
dpm<https://github.com/okfn/dpm>,
the Data Package Manager. Check out dpm's
Issues<https://github.com/okfn/dpm/issues>to see what needs to happen
next.

*## Nomenklatura: looking forward*

Nomenklatura <http://nomenklatura.okfnlabs.org/> does data reconciliation,
making it possible "to maintain a canonical list of entities such as
persons, companies or event streets and to match messy input, such as their
names, against that canonical list".

Friedrich Lindenberg <http://pudo.org/> has noted on the Labs mailing list
that Nomenklatura has some serious
problems<http://lists.okfn.org/pipermail/okfn-labs/2013-November/001138.html>,
and he has proposed "a fairly radical re-framing of the service".

The conversation around what this re-framing should look like is still
underway—check out the discussion
thread<http://lists.okfn.org/pipermail/okfn-labs/2013-November/001138.html>and
jump in with your ideas.

*## Data Issues: following issues*

Last week, the idea of Data
Issues<http://okfnlabs.org/blog/2013/11/06/tracking-data-issues.html>was
floated: using GitHub Issues to track problems with public datasets.
The idea has generated some conversation, and we'd love to hear more.

Discussion on the Labs list highlighted another benefit of using
GitHub. Alioune
Dia <https://github.com/aliounedia> suggested that Data Issues should let
users register to be notified when a particular issue is fixed. But Chris
Mear <http://t.co/uxWokfMXJs> pointed out that GitHub already makes this
possible: "Any GitHub user can 'follow' a specific issue by using the
notification button at the bottom of the issue page."

*## Get involved*

Anyone can join the Labs community and get involved! Read more about how
you can join the community <http://okfnlabs.org/join/> and participate by
coding, wrangling data, or doing outreach and engagement. Also check out
the Ideas Page <http://okfnlabs.org/ideas/> to see what's cooking in the
Labs.
-- 
Neil Ashton

Technical Writer and Analyst  | skype: nmashton

The Open Knowledge Foundation <http://okfn.org/>
*Empowering through Open Knowledge*
http://okfn.org/ | @okfn <http://twitter.com/okfn> | OKF on
Facebook<https://www.facebook.com/OKFNetwork>|
Blog <http://blog.okfn.org/> | Newsletter <http://okfn.org/about/newsletter>

OpenSpending | http://openspending.org/ |
@openspending<http://twitter.com/openspending>|  Tracking every
government financial transaction across the world
School of Data | http://schoolofdata.org |
@schoolofdata<http://twitter.com/schoolofdata>| Evidence is Power
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20131114/f613ea98/attachment-0003.html>


More information about the okfn-labs mailing list