[ckan-dev] URL issue in ckanext-qa/archiver
David Read
david.read at hackneyworkshop.com
Tue Sep 25 16:11:18 UTC 2012
Just to say that a later version of 'requests' also solves this
problem (it uses urllib3 instead of urllib2). I'm updating ckanext-qa
and ckanext-archiver to use this newer version.
David
On 25 September 2012 12:22, David Read <david.read at hackneyworkshop.com> wrote:
> I just wanted to note an issue we have with running CKAN on Python
> 2.6, prompting us to move to Python 2.7, in case other people have
> this issue too, or someone has another suggestion.
>
> It seems there is a bug in the url/http handling when parsing URLs
> with lax encoding. See the error below, which has a URL with a colon
> in the URL parameter. This occurs in ckanext-qa and ckanext-archiver
> when downloading resources.
>
> Strictly the colon should be URL encoded, but since all browsers
> accept it, then we need to cope with it too. data.gov.uk has a lot of
> these links stored in the web archive like this.
>
> Dave
>
>>>> requests.head('http://webarchive.nationalarchives.gov.uk/+/http://www.homeoffice.gov.uk/publications/science-research-statistics/research-statistics/drugs-alcohol-research/hosb1310/hosb1310-ann2tabs?view=Binary')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/api.py",
> line 116, in head
> return request('HEAD', url, **kwargs)
> File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/api.py",
> line 84, in request
> r.send()
> File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/models.py",
> line 377, in send
> self._build_response(why, is_error=True)
> File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/models.py",
> line 250, in _build_response
> request.send()
> File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/models.py",
> line 349, in send
> resp = opener(req, timeout=self.timeout)
> File "/usr/lib/python2.6/urllib2.py", line 391, in open
> response = self._open(req, data)
> File "/usr/lib/python2.6/urllib2.py", line 409, in _open
> '_open', req)
> File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
> result = func(*args)
> File "/usr/lib/python2.6/urllib2.py", line 1161, in http_open
> return self.do_open(httplib.HTTPConnection, req)
> File "/usr/lib/python2.6/urllib2.py", line 1107, in do_open
> h = http_class(host, timeout=req.timeout) # will parse host:port
> File "/usr/lib/python2.6/httplib.py", line 657, in __init__
> self._set_hostport(host, port)
> File "/usr/lib/python2.6/httplib.py", line 682, in _set_hostport
> raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
> httplib.InvalidURL: nonnumeric port: ''
More information about the ckan-dev
mailing list