[ECODP-dev] Dataset usage statistics

ZAJAC Agnieszka (OP) Agnieszka.ZAJAC at publications.europa.eu
Wed Oct 16 15:21:52 UTC 2013


Dear John,

Thank you for the responses. It's very helpful. My comments are in green.

Regards,
Agnieszka


From: John Glover [mailto:john.glover at okfn.org]
Sent: Wednesday, October 16, 2013 2:15 PM
To: ZAJAC Agnieszka (OP)
Cc: Project list for EC ODP CKAN project; Bert.Van.Nuffelen at tenforce.com; ZADRA Julien (OP-EXT); HOHN Norbert (OP); PASTOR CAMARASA José Juan (OP); SABETE Vafa (OP); MEYER André (OP-EXT); BOUSSERT Philippe (OP)
Subject: Re: [ECODP-dev] Dataset usage statistics

Hello Agnieszka,


 A few minutes for generation of the report sounds reassuring. Could the script be launched automatically to generate the stats eg. every day at particular hour?

Yes I don't think that this should be a problem.
 OK. Then now it's our internal issue to arrange for it.

Apart from that I have question regarding report (see attached file):

-       Why in some cases the name of the data provider is there and sometimes not? I checked 3 first datasets and they all belong to Estat so the question is why in some case this information is omitted by the script?
The publisher name is only fetched for datasets where the recent view count is > 0, which seems like an oversight on our part.
 I created Jira for it.

-       what's the total view time span? Last reset of database?
Yes, since the database was created.
 That's not clear. Bert said it is created for the time span predefined in the query. However it is not indicated in the result table. I create Jira for an improvement if possible to have this information in the report.

-       Why dataset with status private (eg. SJ) are also included although not visible on the portal?
The command does not currently query for public/private status (as the person running it must be a sysadmin anyway).
 We are generally interested in the view of public data. But we can live with it.

I cross-checked the stats with the results of "the most viewed" dataset. The outcome is quite curious: the number of views corresponds to "total view" from ckan report however how datasets are selected for display in this column is a complete mystery – they seem like randomly selected (see in attached file). Any idea why?

Yes, there is a bug in the current query (see ODP-291 on Jira) that is incorrectly filtering using the date that the tracking data for each dataset was updated. In effect, I think that you are only seeing datasets that were actually viewed in the previous day (as opposed to the highest view count up to the previous day, regardless of when it was last viewed).
I know it will change in 1.00. It would be great if you could provide us some information how it is expected to work while taking info from Solr.

Regards,
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131016/d4d25d02/attachment.htm>


More information about the ecodp-dev mailing list