[ckan-dev] Problems with file upload by code on Apache

Florian.Brucker at mb.karlsruhe.de Florian.Brucker at mb.karlsruhe.de
Tue May 31 10:27:30 UTC 2016



Hello everybody,

I'm trying to create a new resource (including an uploaded file) via
Python code. The use case is a custom harvester, but my problem seems
to be independent from ckanext-harvest.

Here's some code that shows the problem:

-----------------------------
#!/usr/bin/env python

import cgi
import os.path
import urllib2

import paste.deploy
from paste.registry import Registry
import pylons

from ckan.config.environment import load_environment
import ckan.plugins.toolkit as toolkit
from ckan.lib.cli import MockTranslator
from ckan.model import User


# Adapted from ckan.lib.cli.CkanCommand._load_config
def load_config(ini_path):
    ini_path = os.path.abspath(ini_path)
    conf = paste.deploy.appconfig('config:' + ini_path)
    load_environment(conf.global_conf, conf.local_conf)

    registry = Registry()
    registry.prepare()
    registry.register(pylons.translator, MockTranslator())

    registry.register(pylons.c, pylons.util.AttribSafeContextObj())
    user = toolkit.get_action('get_site_user')({'ignore_auth': True}, {})
    pylons.c.user = user['name']
    pylons.c.userobj = User.get(user['name'])


def create_resource(f, pkg_id, name):
    upload = cgi.FieldStorage()
    upload.filename = getattr(f, 'name', 'data')
    upload.file = f
    data_dict = {
        'package_id': pkg_id,
        'name': name,
        'upload': upload,
        'url': 'unused-but-required',
    }
    return toolkit.get_action('resource_create')({}, data_dict)


if __name__ == '__main__':
    import sys
    import StringIO
    load_config(sys.argv[1])

    PKG_ID = 'bde56c8d-c9fa-47ad-8efb-9917e6751027'

    fake_file = StringIO.StringIO('1,2,3')
    fake_file.name = 'data.csv'

    res_dict = create_resource(fake_file, PKG_ID, 'My Resource')
    print(res_dict['url'])
    try:
        c = urllib2.urlopen(res_dict['url'])
    except urllib2.HTTPError as e:
        print(e)
    else:
        print c.getcode()
-----------------------------

When I run that code against my development.ini (which uses paster
serve) then it works: The resource is created, the data is uploaded,
the URL is updated, and the file can then be downloaded from the
updated URL:

-----------------------------
$ sudo -u www-data /usr/lib/ckan/default/bin/python
upload_test.py /etc/ckan/default/development.ini
/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py:79:
 SAWarning: Usage of the 'related attribute set' operation is not currently
supported within the execution stage of the flush process. Results may not
be consistent.  Consider using alternative event listeners or
connection-level operations instead.
  sess._flush_warning("related attribute set")
http://172.16.16.17:5000/dataset/bde56c8d-c9fa-47ad-8efb-9917e6751027/resource/2d927d5d-05e3-487f-a61b-851e1000be64/download/data.csv
200
-----------------------------

However, if I run the code against my production.ini (which uses
Apache and sets debug = false, but is otherwise equal to
development.ini) then the final step (downloading the resource from the
updated URL) fails with a 404:

-----------------------------
$ sudo -u www-data /usr/lib/ckan/default/bin/python
upload_test.py /etc/ckan/default/production.ini
/usr/lib/ckan/default/local/lib/python2.7/site-packages/sqlalchemy/orm/unitofwork.py:79:
 SAWarning: Usage of the 'related attribute set' operation is not currently
supported within the execution stage of the flush process. Results may not
be consistent.  Consider using alternative event listeners or
connection-level operations instead.
  sess._flush_warning("related attribute set")
http://172.16.16.17:9000/dataset/bde56c8d-c9fa-47ad-8efb-9917e6751027/resource/827617f6-a009-4ab6-a4f2-34a983d08541/download/data.csv
HTTP Error 404: Not Found
-----------------------------

The resource itself is successfully created and is displayed in the web
UI (obviously a download via the web UI also fails with a 404).

The resource file is there and has the proper permissions:

-----------------------------
$ ls -l /var/lib/ckan/resources/827/617/f6-a009-4ab6-a4f2-34a983d08541
-rw-r--r-- 1 www-data www-data 5 May 31
11:44 /var/lib/ckan/resources/827/617/f6-a009-4ab6-a4f2-34a983d08541
-----------------------------

Interestingly, if I edit the resource in the web UI and submit the form
without making any changes then the download starts working! I've
compared the output of resource_show before and after the edit, and the
resource itself hasn't changed.

Similarly, submitting the resource's *package* edit form without any
changes also makes the download start working.

However, faking the fake resource edit by passing the dict returned
from resource_create to resource_update does *not* fix the download
problem.

I didn't find anything interesting in the log files, and I'm out of
ideas on how to debug this any further.


Regards,
Florian
--
Stadt Karlsruhe, Medienbüro
Tel: 0721-133-1884
florian.brucker at mb.karlsruhe.de
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20160531/3267d335/attachment-0002.html>


More information about the ckan-dev mailing list