[ckan-dev] importers

David Read david.read at okfn.org
Wed Mar 16 10:53:33 UTC 2011


Hi Jindrich,

(copying to ckan-dev as they are maybe interested too)

The /ckan/lib/spreadsheet_importer.py in Pudo's repo is code which
reads in CSV and XLS files and gives records. It is not in the main
codebase because it has been moved out to a CKAN extension:
https://bitbucket.org/okfn/ckanext/src/default/ckanext/importer/ where
it is used and actively maintained.

The idea is that you have a spreadsheet with each row relating to one
package's metadata. For each spreadsheet format you just write some
simple code (derived from PackageImporter) to convert this row record
(dictionary keyed by column title) into a package dictionary. You can
then use a standard Loader (in ckanext/loader.py) to load it over the
CKAN API into a CKAN website. This is all run from the command line.

It boils down to having to write one function to suit your spreadsheet
format, which could be as simple as this:
class MyFormatImporter(SpreadsheetPackageImporter):
    def record_2_package(self, row_dict):
        package = {
            'title': row_dict['title'],
            'name': self.name_munge(row_dict['title']),
            'description': row_dict['description'],
            'license': self.license_2_license_id(row_dict['license_name']),
            'resources': {'url': row_dict['url'], 'format': row_dict['format']}
        }
        return package

For full documentation and fuller examples, see
http://packages.python.org/ckan/loader_scripts.html

We did play with the idea of putting a web front-end to these
importers, but we realised that the format of these metadata
spreadsheets varies so much, that writing one interface to cope with
all the different inconsistencies is a big project. Our current
thinking is that it is much easier to express this all in code (as
explained above). If you disagree and have ideas for how this work
then feel free to fork our code and see what you can come up with. The
is the code mothballed here:
https://bitbucket.org/okfn/ckanext-importer and there are the remains
of the docs here: http://packages.python.org/ckan/importer.html

Do let us know if this is of interest.

David

2011/3/15 Jindřich Mynarz <mynarzjindrich at gmail.com>:
> Hi David,
>
> The importer I had on mind is Friedrich`s
> /ckan/lib/spreadsheet_importer.py in his version of CKAN [1]. My
> question is whether there are plans to incorporate the Excel importer
> in the CKAN`s user interface. It seems this was the plan couple of
> months back [2] but it wasn`t implemented. The CKAN`s documentation
> says the importer feature is not available [3], so, will it be
> available or are there other plans for it?
>
> If the Excel spreadsheet importer was decided to stay separate, how
> are you supposed to use it? If there`s documentation for it, just
> point me to it.
>
> Best,
>
> Jindrich
>
> [1] https://bitbucket.org/pudo/ckan/src
> [2] http://trac.ckan.org/ticket/178
> [3] http://packages.python.org/ckan/importer.html
>
> On Tue, Mar 15, 2011 at 7:01 PM, David Read <david.read at okfn.org> wrote:
>> Hi Jindřich!
>>
>>> @pudo Do you know if there are plans to merge your CKAN`s spreadsheet importer to the main CKAN`s branch?
>>
>> Cheers for your message on Twitter. Just wondering which importers and
>> branches are you referring to? Are you looking at the google docs
>> importer in the ckanclient repo? And were you talking about it being
>> merged into the ckanext spreadsheet importer code perhaps?
>>
>> Also, do let us know if what your project is involving CKAN - it's
>> always good to hear about it being useful (hopefully!) Thanks,
>>
>> David
>>
>




More information about the ckan-dev mailing list