[ckan-dev] Fwd: Resource validation

Tue Mar 6 14:17:50 UTC 2012

Hi Ian,

The functionality of this seems so close to what the QA extension does
at the moment (also does a HEAD request and checks header info) that I
would recommend just extending QA slightly to do what you want. You
could maybe just break the validation / resource scoring process into
two tasks, with an additional API call (optional) to save information
back against the resource object instead of writing it to the
task_status table. This is what used to happen in fact.

So my vote would be to keep it in an extension (part of QA) and do it
asynchronously, but it whether to go sync/async would depend on the
workflow of the new form that you are making. Of course, you can call
Celery tasks synchronously as well so it's should be straight-forward
to change later if necessary.

Cheers,
John

On 6 March 2012 12:41, Ian Murray <ian.murray at okfn.org> wrote:
> Moving to ckan-dev list.
>
>
> ---------- Forwarded message ----------
> From: Ian Murray <ian.murray at okfn.org>
> Date: 6 March 2012 12:16
> Subject: Resource validation
> To: Ckan-coord at lists.okfn.org
>
>
> Hi all,
>
> DGU require a little bit of resource validation when linking to files.  I've
> written up a quick proposal for CKAN-core
> here: http://ckan.okfnpad.org/dgu-package-form (lines 316-428).
>
> In brief: when creating a resource that links to a file (not uploading a
> file, nor linking to a service at this stage), an ajax request is made to
> the server which in-turn makes a HEAD request to the supplied url.  It then
> returns a dict of information that can be used client-side to populate some
> of the resource fields.  And also, provide some highlighting (and error
> message) if the URL appears to be broken.
>
> There's no validation, it's just a tool to help with the dataset and
> resource creation.  ie - there's no URL validation server-side when the
> dataset is saved.  I think broken existing links are better found offline
> with the qa extension.
>
> There are two things I'm uncertain about:
>
> 1. Should this be core functionality, or provided through an extension?  And
> if through an extension, which one?  It's own, or would it fit naturally
> into a different one (qa, archiver, ...?)
>
> 2. Should the HEAD request to the remote server be made off of the
> request-thread (ie - in an asynchronous celery task) or not?  Normally with
> this sort of thing, I would make the remote request on a seperate process to
> that which handles the server's http requests.  However, doing that would
> mean:
>
>  a) a barebones ckan deployment would require celeryd (which I'm not sure
> we'd want to do?)
>  b) a slightly more complicated workflow. (not a problem, just being a bit
> lazy)
>
>    And I'm willing to break that rule for this because:
>
>   a) It's only a HEAD request.
>   b) It's not a high-traffic part of the site
>
> Any feedback appreciated :)
>
> Ian.
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
>