[ckan-dev] URL issue in ckanext-qa/archiver

David Read david.read at hackneyworkshop.com
Tue Sep 25 11:22:31 UTC 2012


I just wanted to note an issue we have with running CKAN on Python
2.6, prompting us to move to Python 2.7, in case other people have
this issue too, or someone has another suggestion.

It seems there is a bug in the url/http handling when parsing URLs
with lax encoding. See the error below, which has a URL with a colon
in the URL parameter. This occurs in ckanext-qa and ckanext-archiver
when downloading resources.

Strictly the colon should be URL encoded, but since all browsers
accept it, then we need to cope with it too. data.gov.uk has a lot of
these links stored in the web archive like this.

Dave

>>> requests.head('http://webarchive.nationalarchives.gov.uk/+/http://www.homeoffice.gov.uk/publications/science-research-statistics/research-statistics/drugs-alcohol-research/hosb1310/hosb1310-ann2tabs?view=Binary')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/api.py",
line 116, in head
    return request('HEAD', url, **kwargs)
  File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/api.py",
line 84, in request
    r.send()
  File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/models.py",
line 377, in send
    self._build_response(why, is_error=True)
  File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/models.py",
line 250, in _build_response
    request.send()
  File "/home/co/pyenv_dgu/lib/python2.6/site-packages/requests/models.py",
line 349, in send
    resp = opener(req, timeout=self.timeout)
  File "/usr/lib/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 1161, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.6/urllib2.py", line 1107, in do_open
    h = http_class(host, timeout=req.timeout) # will parse host:port
  File "/usr/lib/python2.6/httplib.py", line 657, in __init__
    self._set_hostport(host, port)
  File "/usr/lib/python2.6/httplib.py", line 682, in _set_hostport
    raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
httplib.InvalidURL: nonnumeric port: ''




More information about the ckan-dev mailing list