[okfn-labs] Find country names in blobs of unknown text

Friedrich Lindenberg friedrich at pudo.org
Fri Jun 13 17:04:30 UTC 2014


On that note: http://opennames.org/datasets/iso-countries 

- Friedrich 

On 13 Jun 2014, at 19:03, Michael Bauer <michael.bauer at okfn.org> wrote:

> Thomas,
> 
> Could you use Opennames (hrm. Nomenklatura) for something like this?
> 
> e.g. add in the ISO country list and then work on alternate spellings?
> 
> Michael
> 
> On Fri, Jun 13, 2014 at 11:54:28AM -0400, Thomas Levine wrote:
>> I'm looking for a function or regular expression that finds country names in blobs of text.
>> This can just be something that does a bunch of exact string matches so that it doesn't matter
>> whether the source blob (company names in my case) is spelled "Aecom New Zealand Limited",
>> "Aecom (New Zealand)", "Aecom, New Zealand", or "New Zealand". Has someone released something
>> like this?
>> 
>> If I don't see an answer soon, I'm going to write a regular expression that matches with a
>> bunch of country names from some country name dataset.
>> _______________________________________________
>> okfn-labs mailing list
>> okfn-labs at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/okfn-labs
>> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
> 
> -- 
> Data Diva | skype: mihi_tr | @mihi_tr
> Open Knowledge | School of Data
> http://okfn.org | http://schoolofdata.org 
> GPG/PGP key: http://tentacleriot.eu/mihi.asc
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20140613/1c54a951/attachment-0004.sig>


More information about the okfn-labs mailing list