[ckan-dev] URL issue in ckanext-qa/archiver

David Read david.read at hackneyworkshop.com
Tue Sep 25 16:11:18 UTC 2012


Just to say that a later version of 'requests' also solves this
problem (it uses urllib3 instead of urllib2). I'm updating ckanext-qa
and ckanext-archiver to use this newer version.

David

On 25 September 2012 12:22, David Read <david.read at hackneyworkshop.com> wrote:
> I just wanted to note an issue we have with running CKAN on Python
> 2.6, prompting us to move to Python 2.7, in case other people have
> this issue too, or someone has another suggestion.
>
> It seems there is a bug in the url/http handling when parsing URLs
> with lax encoding. See the error below, which has a URL with a colon
> in the URL parameter. This occurs in ckanext-qa and ckanext-archiver
> when downloading resources.
>
> Strictly the colon should be URL encoded, but since all browsers
> accept it, then we need to cope with it too. data.gov.uk has a lot of
> these links stored in the web archive like this.
>
> Dave
>
>>>> requests.head('http://webarchive.nationalarchives.gov.uk/+/http://www.homeoffice.gov.uk/publications/science-research-statistics/research-statistics/drugs-alcohol-research/hosb1310/hosb1310-ann2tabs?view=Binary')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/api.py",
> line 116, in head
>     return request('HEAD', url, **kwargs)
>   File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/api.py",
> line 84, in request
>     r.send()
>   File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/models.py",
> line 377, in send
>     self._build_response(why, is_error=True)
>   File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/models.py",
> line 250, in _build_response
>     request.send()
>   File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/models.py",
> line 349, in send
>     resp = opener(req, timeout=self.timeout)
>   File "/usr/lib/python2.6/urllib2.py", line 391, in open
>     response = self._open(req, data)
>   File "/usr/lib/python2.6/urllib2.py", line 409, in _open
>     '_open', req)
>   File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
>     result = func(*args)
>   File "/usr/lib/python2.6/urllib2.py", line 1161, in http_open
>     return self.do_open(httplib.HTTPConnection, req)
>   File "/usr/lib/python2.6/urllib2.py", line 1107, in do_open
>     h = http_class(host, timeout=req.timeout) # will parse host:port
>   File "/usr/lib/python2.6/httplib.py", line 657, in __init__
>     self._set_hostport(host, port)
>   File "/usr/lib/python2.6/httplib.py", line 682, in _set_hostport
>     raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
> httplib.InvalidURL: nonnumeric port: ''




More information about the ckan-dev mailing list