[ckan-dev] [ckan-discuss] ANN: Semantic CKAN - Revisited

David Read d.t.read at gmail.com
Mon Apr 18 13:22:29 UTC 2011



On 18 Apr 2011, at 12:41, William Waites <ww at styx.org> wrote:

> Hi David thanks for pointing out the revision API i wasn't aware of it.  However a couple of issues. First consider the bootstrap of an empty mirror.

I think an effective mirror should use a combination of occasional systematic crawls (as you have now), and frequent update using just the changes (use the revision api).


> Crawling the revisions will be worse because of duplicates. Also from an outside perspective ckan revisions are an implementation detail that I'm not interested in. And using them means double the traffic in the best case since i have to get the list of revisions then get each revision then get each ppackage.

I think the traffic will be far lighter than getting the full list of packages and polling every one.

> 
> Supplementary questions, what format is the time parameter?

See example in the docs

> Can i specify a time zone?

No. There is a ticket for this I'm progressing but it is low priority.

> If not, what timezone is it taken to be?

Utc for all timestamps with current versions of ckan. (i need to add that to docs and be explicit where time is exposed) It's worth double checking for older ones.

Dave

> If server local time how do i find out what that is? If local time then timezone should be added to the admin metadata call
> 
> 
> David Read <david.read at okfn.org> a écrit :
> 
>> Will,
>> 
>> On 18 April 2011 10:39, William Waites <ww at styx.org> wrote:
>>> The revision feed could be improved to support this. Right now it isn't very well structured and basicallly needs to be scraped. Also Jason is easier to deal with than XML. I think it would be a good addition to the API. Adding an atom feed to semantic.clan.net would be worthwhile too.
>> 
>> The Atom feed is indeed for humans to read, not so much machines. I
>> should have been clearer - the revision API gives this info in the
>> JSON format you require. I've done an example below which shows how
>> you can get the name of the changed packages (use API version 2 to get
>> the IDs of the packages).
>> 
>> David
>> 
>> GET http://ckan.net/api/search/revision?since_time=2011-04-17
>> response:
>> ["9c5c0383-5e63-479b-b844-e27a1f2416dc",
>> "b6522329-c040-47cc-8714-150a7f657bc3",
>> "c3a6e66a-1f8e-4499-9565-9b377d67cbd1",
>> "da7dab2a-d0ec-4275-b92e-83662e80b051",
>> "ed75e12d-f250-469f-844f-e80d80eb01b2",
>> "4528f9b4-0887-4304-81d3-010c32982fbc",
>> "b2c4890c-e7cd-4e6a-89fb-6a2f10a6b589",
>> "264cb5a8-2812-44ad-88ff-0389d2d26650",
>> "82a43c45-983b-4e64-8c16-9a776056c129",
>> "8b83fc7b-4582-4bc9-925e-304fd09a50ad",
>> "d6c24b24-33b2-419d-905e-9e097a8951ff",
>> "0a7a2f31-95cc-4509-8872-4873cdba6972",
>> "b7bcfd10-031f-434c-bbbb-8041bd42e7bc"]
>> 
>> GET http://ckan.net/api/rest/revision/9c5c0383-5e63-479b-b844-e27a1f2416dc
>> response:
>> {"id": "9c5c0383-5e63-479b-b844-e27a1f2416dc", "timestamp":
>> "2011-04-18T00:20:10.385222", "message": "", "author":
>> "http://aharth.myopenid.com/", "packages": ["linked-edgar"]}
>> 
>>> 
>>> 
>>> David Read <david.read at okfn.org> a écrit :
>>> 
>>>> Great to see this live, Will!
>>>> 
>>>>> Also, unticketed as of yet, I think, is either a change to the
>>>>> search API or a new api call that is like /api/rest/package
>>>>> but takes a modified-since parameter. This would reduce load
>>>>> across the board dramatically.
>>>> 
>>>> I agree it might seem excessive for you to regularly poll all
>>>> packages. Why not just follow the revision feed? It's designed for
>>>> this sort of application, taking the since_time parameter. And this
>>>> might be useful to add to your api too.
>>>> 
>>>> David
>>> 




More information about the ckan-dev mailing list