[ckan-dev] potential lost resource in case of 2 simultaneous requests ?
Alex Gartner
alexandru.gartner+ckan at gmail.com
Sun Jul 2 00:27:55 UTC 2017
I've changed a bit package_show() and the get() function from Package model
to use "with_for_update()
<http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.with_for_update>"
from sqlalchemy. The idea was to use "*select for update*" when retrieving
the package so that another request/transaction can't modify them in the
meantime. Tested locally and it seemed to work fine: prevented the resource
from disappearing as in the scenario from my initial email.
*package_show()* - added support for a 'for_update' flag in context. This
flag was set when resource_create() / resource_update() started.:
*for_update = context.get('for_update', False)*
*pkg = model.Package.get(name_or_id, for_update)*
*get() in Package model*
*@classmethoddef get(cls, reference, for_update=False): '''Returns
a package object referenced by its id or name.''' query =
meta.Session.query(cls).filter(cls.id==reference) if for_update:
query = query.with_for_update() pkg = query.first() if pkg
== None: pkg = cls.by_name(reference) return pkg*
On Sat, Jul 1, 2017 at 1:59 AM, Alex Gartner <
alexandru.gartner+ckan at gmail.com> wrote:
> Hello,
>
> I'm wondering if the fact that resource_create() ( and similarly
> resource_update() ) does a package_show()
> <https://github.com/OCHA-DAP/hdx-ckan/blob/dev/ckan/logic/action/create.py#L285>
> followed later by a package_update()
> <https://github.com/OCHA-DAP/hdx-ckan/blob/dev/ckan/logic/action/create.py#L302>
> can potentially lead to a lost resource in some special cases.
>
> Since postgres uses by default "read committed" transaction isolation I
> think the following could happen:
> 2 almost simultaneous requests (R1, R2) come to the API and are dealt
> with by different processes/threads. Both are modifying dataset *D which
> already has one resource (resource1)*
>
> TIMELINE
>
> 1. (R1) starts *resource_create*(resource2) on dataset D
> 2. (R2) starts *resource_update*(resource1) on dataset D
> 3. (R2) does package_show(D) => *gets D with just resource1*
> 4. (R2) changes resource1 in D
> 5. (R1) does package_show(D) => gets D with resource1
> 6. (R1) adds resource2 to D => D.resources = [resource1, resource2]
> 7. (R1) does package_update(D)
> 8. (R1) resource_create() finishes, everything is committed
> successfully to the db => D has 2 resources in the db
> 9. (R2) does package_update(D) - please note that here D only has one
> resource as read in step 3
> 10. (R2) resource_update() finishes, everything is committed
> successfully to the db => D has just resource1 (resource2 disappears)
>
> Question: is this something that seems possible ? I reproduced this
> locally on a slightly modified CKAN running paster but that could also mean
> that I have something misconfigured or changed. Before starting to think
> about strategies for avoiding this scenario (like a different transaction
> isolation) is there some mechanism in CKAN that would prevent this ? Did
> anyone stumble onto such an issue ?
>
> Thank you,
> Alex Gartner
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20170702/5d25d358/attachment-0002.html>
More information about the ckan-dev
mailing list