[ckan-dev] Harvestor extension for CKAN

Lasse Vestergaard ibbernik at gmail.com
Wed Jan 16 13:00:19 UTC 2013


Hi Adriá.

First of all thank you very much for your patience.

I have no custom extensions and I have installed CKAN on a clean Ubuntu
10.04, so all that's in the pip freeze log was there already (plus the CKAN
parts I installed).

I have been using this manual:
http://docs.ckan.org/en/ckan-1.8/install-from-package.html

My ini file:

#
# ckan - Pylons configuration
#
# The %(here)s variable will be replaced with the parent directory of this
file
#
[DEFAULT]
debug = true
email_to = root
smtp_server = localhost
error_email_from = ckan-std at ckan1-desktop

[server:main]
use = egg:Paste#http
host = 0.0.0.0
port = 5000

[app:main]
use = egg:ckan
full_stack = true
cache_dir = /var/lib/ckan/std/data
beaker.session.key = ckan
beaker.session.secret = m572KeLw/Yvedi1rRkZodxMDE
app_instance_uuid = {099c3568-988e-4407-8586-a0cf51d5a8b0}

# List the names of CKAN extensions to activate.
# Note: This line is required to be here for packaging, even if it is empty.
ckan.plugins = stats datastore datastorer harvest ckan_harvester

# If you'd like to fine-tune the individual locations of the cache data dirs
# for the Cache data, or the Session saves, un-comment the desired settings
# here:
#beaker.cache.data_dir = %(here)s/data/cache
#beaker.session.data_dir = %(here)s/data/sessions

# WARNING: *THE LINE BELOW MUST BE UNCOMMENTED ON A PRODUCTION ENVIRONMENT*
# Debug mode will enable the interactive debugging tool, allowing ANYONE to
# execute malicious code after an exception is raised.
set debug = false

# Specify the database for SQLAlchemy to use:
# * Postgres is currently required for a production CKAN deployment
# * Sqlite (memory or file) can be used as a quick alternative for testing
sqlalchemy.url = postgresql://std:LIyEh0dR-V@localhost/std
#sqlalchemy.url = sqlite:///
#sqlalchemy.url = sqlite:///%(here)s/somedb.db

## Datastore
## Uncommment to set the datastore urls
ckan.datastore.write_url = postgresql://writeuser:password@localhost
/datastore
ckan.datastore.read_url = postgresql://readonlyuser:password@localhost
/datastore

# repoze.who config
who.config_file = /etc/ckan/std/who.ini
who.log_level = warning
who.log_file = %(cache_dir)s/who_log.ini

# Location of RDF versions of datasets
#rdf_packages = http://semantic.ckan.net/record/

# Location of licenses group (defaults to cached local version of ckan
group)
#licenses_group_url =
http://licenses.opendefinition.org/licenses/groups/ckan.json

# Dataset form to use
package_form = standard

# Hide certain extras fields from dataset read form:
# package_hide_extras = for_search_index_only

# API configuration
#apikey_header_name = X-CKAN-API-Key

## extra places to look for templates and public files (comma separated
lists)
## any templates/files found will override correspondingly named ones in
## ckan/templates/ and ckan/public
## (e.g. to override main layout template layout.html or add extra css
files)
# extra_template_paths = %(here)s/my-templates
# extra_public_paths = %(here)s/my-public

# Dataset form integration
#package_edit_return_url = http://another.frontend/dataset/<NAME>
#package_new_return_url = http://another.frontend/dataset/<NAME>


# Turn on messaging with carrot, default to false
#ckan.async_notifier = true
# Messaging module used by carrot:
# * pyamqplib - AMQP (e.g. for RabbitMQ)
# * queue - native Python Queue (debugging and tests only)
#carrot_messaging_library = pyamqplib

## Perform search just using database (rather than use e.g. solr).
## In this setup search is crude and limited .e.g no full-text search, no
faceting ...
## However, very useful for getting up and running quickly with CKAN
# ckan.simple_search = 1

## Title of site (using in several places including templates and <title>
tag
ckan.site_title = CKAN

## Logo image to use on the home page
ckan.site_logo = /img/logo_64px_wide.png

## Site tagline / description (used on front page)
ckan.site_description =

## Used in creating some absolute urls (such as rss feeds, css files) and
## dump filenames
ckan.site_url =http://localhost

## Favicon (default is the CKAN software favicon)
ckan.favicon = /images/icons/ckan.ico

## The gravatar default to use.  This can be any of the pre-defined strings
## as defined on http://en.gravatar.com/site/implement/images/ (e.g.
"identicon"
## or "mm").  Or it can be a url, e.g. "http://example.com/images/avatar.jpg
"
ckan.gravatar_default = identicon

## Solr support
solr_url = http://127.0.0.1:8983/solr

## Automatic indexing. Make all changes immediately available via the search
## after editing or creating a dataset. Default is true. If for some reason
## you need the indexing to occur asynchronously, set this option to 0.
# ckan.search.automatic_indexing = 1

## An 'id' for the site (using, for example, when creating entries in a
common search index)
## If not specified derived from the site_url
ckan.site_id = std

## API url to use (e.g. in AJAX callbacks)
## Enable if the API is at a different domain
# ckan.api_url = http://www.ckan.net

## html content to be inserted just before </head> tag (e.g. extra
stylesheet)
## NB: can use html e.g. <strong>blah</strong>
## NB: can have multiline strings just indent following lines
# ckan.template_head_end = <link rel="stylesheet" href="
http://mysite.org/css/custom.css" type="text/css">

## html content to be inserted just before </body> tag (e.g. google
analytics code)
## NB: can use html e.g. <strong>blah</strong>
## NB: can have multiline strings just indent following lines
# ckan.template_footer_end =

# These three settings (ckan.log_dir, ckan.dump_dir and ckan.backup_dir) are
# all used in cron jobs, not in CKAN itself. CKAN logging is configured
# in the logging configuration below
# Directory for logs (produced by cron scripts associated with ckan)
ckan.log_dir = %(here)s/log
# Directory for JSON/CSV dumps (must match setting in apache config)
ckan.dump_dir = /var/lib/ckan/std/static/dump
# Directory for SQL database backups
ckan.backup_dir = %(here)s/backup

# Default authorizations for new domain objects
#ckan.default_roles.Package = {"visitor": ["reader"], "logged_in":
["reader"]}
#ckan.default_roles.Group = {"visitor": ["reader"], "logged_in": ["reader"]}
#ckan.default_roles.System = {"visitor": ["reader"], "logged_in":
["editor"]}
#ckan.default_roles.AuthorizationGroup = {"visitor": ["reader"],
"logged_in": ["reader"]}

## Ckan public and private recaptcha keys [localhost]
#ckan.recaptcha.publickey =
#ckan.recaptcha.privatekey =

# Locale/languages
ckan.locale_default = en
#ckan.locales_offered =
# Languages are grouped by percentage of strings in CKAN 1.8 translated
# (those with 100% first, then those with >=80%, then >=50%, then <50%) and
# within these groups roughly sorted by number of worldwide native speakers
# according to Wikipedia.
ckan.locale_order = en es pt_BR ja fr it ko_KR cs_CZ ca fi el sv sr
sr at latinno sk ru de pl nl bg hu sa sl lv
ckan.locales_filtered_out =

## Atom Feeds
#
# Settings for customising the metadata provided in
# atom feeds.
#
# These settings are used to generate the <id> tags for both feeds
# and entries. The unique <id>s are created following the method
# outlined in http://www.taguri.org/  ie - they generate tagURIs, as
specified
# in http://tools.ietf.org/html/rfc4151#section-2.1 :
#
# <id>tag:thedatahub.org
,2012:/feeds/group/933f3857-79fd-4beb-a835-c0349e31ce76</id>
#
# Each component has the corresponding settings:
#
#   "thedatahub.org" is ckan.feeds.authority_name
#   "2012"           is ckan.feeds.date
#

# Leave blank to use the ckan.site_url config value, otherwise set to a
# domain or email address that you own.  e.g. thedatahub.org or
# admin at thedatahub.org
ckan.feeds.authority_name =

# Pick a date of the form "yyyy[-mm[-dd]]" during which the above domain was
# owned by you.
ckan.feeds.date = 2012

# If not set, then the value in `ckan.site_id` is used.
ckan.feeds.author_name =

# If not set, then the value in `ckan.site_url` is used.
ckan.feeds.author_link =

## File Store
#
# CKAN allows users to upload files directly to file storage either on the
local
# file system or to online ‘cloud’ storage like Amazon S3 or Google Storage.
#
# If you are using local file storage, remember to set ckan.site_url.
#
# To enable cloud storage (Google or S3), first run: pip install boto
#
# @see http://docs.ckan.org/en/latest/filestore.html

# 'Bucket' to use for file storage
ckan.storage.bucket = /home/ckan1/ckan

# To enable local file storage:
ofs.impl = pairtree
ofs.storage_dir = /home/ckan1/ckan

# To enable Google cloud storage:
#ofs.impl = google
#ofs.gs_access_key_id =
#ofs.gs_secret_access_key =

# To enable S3 cloud storage:
#ofs.impl = s3
#ofs.aws_access_key_id = ....
#ofs.aws_secret_access_key = ....

## ===================================
## Extensions

## Config option to enable the (1 day) cache for stats
## Default (if not defined) is True as stats computations are intensive
# ckanext.stats.cache_enabled = True

# Logging configuration
[loggers]
keys = root, ckan, ckanext

[handlers]
keys = console, file

[formatters]
keys = generic

[logger_root]
level = WARNING
handlers = console, file

[logger_ckan]
level = INFO
handlers = console, file
qualname = ckan
propagate = 0

[logger_ckanext]
level = DEBUG
handlers = console, file
qualname = ckanext
propagate = 0

[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic

[handler_file]
class = logging.handlers.RotatingFileHandler
formatter = generic
level = NOTSET
args = ("/var/log/ckan/std/std.log", "a", 20000000, 9)

[formatter_generic]
format = %(asctime)s %(levelname)-5.5s [%(name)s] %(message)s

/Lasse


2013/1/16 Adrià Mercader <adria.mercader at okfn.org>

> Hi Lasse,
>
> We would love to help, but I've run out of ideas. I tried on my local
> machine to harvest http://ckan.odaa.dk with ckan 1.8 and
> ckanext-harvest master and it worked fine. 12 datasets were imported,
> including the ones with resources like
> fredede-bygninger-og-fortidsminder.
>
> I strongly believe that there is an issue with your particular setup.
>
> Are you using any other extensions, specially custom ones?
> Can you post your ini file online with the security details edited off?
> I see from your pip freeze that there is a lot of stuff that is non
> CKAN related, like Twisted or pycurl which may be affecting things.
> Which instructions are you using for installing CKAN?
>
>
> Adrià
>
>
> On 16 January 2013 10:24, Lasse Vestergaard <ibbernik at gmail.com> wrote:
> > Hi again CKAN developers (primarily Adriá).
> >
> > I'm sorry that I push on this issue with the harvester extension, but I'm
> > working on a project for a municipality in Denmark, and I have to give a
> > talk about some of the things I'm working on. The harvester extension
> is, in
> > my opinion, important and extremely relevant and therefor, I would like
> to
> > be able to talk about this (I need to try it out to be able to give a
> proper
> > explanation of it's capabilities). I hope you can find time to help me
> out
> > on the issue.
> >
> > Best regards
> >
> > Lasse Vestergaard
> >
> > _______________________________________________
> > ckan-dev mailing list
> > ckan-dev at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/ckan-dev
> > Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
> >
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20130116/c59cc828/attachment-0001.html>


More information about the ckan-dev mailing list