[ckan-dev] URL issue in ckanext-qa/archiver
David Read
david.read at hackneyworkshop.com
Tue Sep 25 11:22:31 UTC 2012
I just wanted to note an issue we have with running CKAN on Python
2.6, prompting us to move to Python 2.7, in case other people have
this issue too, or someone has another suggestion.
It seems there is a bug in the url/http handling when parsing URLs
with lax encoding. See the error below, which has a URL with a colon
in the URL parameter. This occurs in ckanext-qa and ckanext-archiver
when downloading resources.
Strictly the colon should be URL encoded, but since all browsers
accept it, then we need to cope with it too. data.gov.uk has a lot of
these links stored in the web archive like this.
Dave
>>> requests.head('http://webarchive.nationalarchives.gov.uk/+/http://www.homeoffice.gov.uk/publications/science-research-statistics/research-statistics/drugs-alcohol-research/hosb1310/hosb1310-ann2tabs?view=Binary')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/api.py",
line 116, in head
return request('HEAD', url, **kwargs)
File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/api.py",
line 84, in request
r.send()
File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/models.py",
line 377, in send
self._build_response(why, is_error=True)
File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/models.py",
line 250, in _build_response
request.send()
File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/models.py",
line 349, in send
resp = opener(req, timeout=self.timeout)
File "/usr/lib/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/python2.6/urllib2.py", line 1161, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.6/urllib2.py", line 1107, in do_open
h = http_class(host, timeout=req.timeout) # will parse host:port
File "/usr/lib/python2.6/httplib.py", line 657, in __init__
self._set_hostport(host, port)
File "/usr/lib/python2.6/httplib.py", line 682, in _set_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: ''
More information about the ckan-dev
mailing list