[ECODP-dev] Dataset usage statistics

ZAJAC Agnieszka (OP) Agnieszka.ZAJAC at publications.europa.eu
Tue Oct 15 11:35:55 UTC 2013


Dear John,

Thank you for the clarifications. A few minutes for generation of the report sounds reassuring. Could the script be launched automatically to generate the stats eg. every day at particular hour?

Apart from that I have question regarding report (see attached file):

-       Why in some cases the name of the data provider is there and sometimes not? I checked 3 first datasets and they all belong to Estat so the question is why in some case this information is omitted by the script?

-       what's the total view time span? Last reset of database?

-       Why dataset with status private (eg. SJ) are also included although not visible on the portal?

I cross-checked the stats with the results of "the most viewed" dataset. The outcome is quite curious: the number of views corresponds to "total view" from ckan report however how datasets are selected for display in this column is a complete mystery – they seem like randomly selected (see in attached file). Any idea why?

Regards,

Agnieszka Zając
Open Data Portal Section
Tel: +352 2929.42834

From: John Glover [mailto:john.glover at okfn.org]
Sent: Monday, October 14, 2013 5:49 PM
To: Project list for EC ODP CKAN project
Cc: ZAJAC Agnieszka (OP); ZADRA Julien (OP-EXT); HOHN Norbert (OP); BOUSSERT Philippe (OP); PASTOR CAMARASA José Juan (OP); SABETE Vafa (OP); MEYER André (OP-EXT)
Subject: Re: [ECODP-dev] Dataset usage statistics

Hi Bert,

-          Does this command have any impact on the ckan? Slowdown, halt, … as there is always a data ingestion going as they take now 3hrs per ingestion.
It pulls information from the PostgreSQL DB so there will be a limited slowdown possible.

Yes, it queries the tracking tables, I would not expect any significant slowdown here.


-          What is the estimated execution time for this?
I run it on our test system for the full year 2013 and it took me around 2 minutes. From the visual output on the screen it seems as if there is a select query done per day.

There are a couple of other aggregate select queries, but yes I wouldn't expect this to take longer than a few minutes.


-          Will these statistics even be correct when generated during a data ingestion?
They correspond to the user clicks. So the ingestion is not part of a user visited click.

This is correct.


Regards,
John



@OKF can you verify my respons.
kind regards,
Bert



2013/10/10 ZAJAC Agnieszka (OP) <Agnieszka.ZAJAC at publications.europa.eu<mailto:Agnieszka.ZAJAC at publications.europa.eu>>
Dear Bert,

Andre has a few quite pertinent question on generation of this statistical data especially given the fact that this script hasn't been used since last year. Could you provide him with answers? Maybe testing it first on your environment would also be useful?

Thank you in advance.
Regards,

Agnieszka Zając
Open Data Portal Section
Tel: +352 2929.42834<tel:%2B352%202929.42834>

From: MEYER André (OP-EXT)
Sent: Thursday, October 10, 2013 12:17 PM
To: ZAJAC Agnieszka (OP)
Cc: SABETE Vafa (OP); HOHN Norbert (OP); PASTOR CAMARASA José Juan (OP); BOUSSERT Philippe (OP); ZADRA Julien (OP-EXT)
Subject: RE: Dataset usage statistics

Hello Agnieszka,

I have a few questions before launching this on prod:


-          Does this command have any impact on the ckan? Slowdown, halt, … as there is always a data ingestion going as they take now 3hrs per ingestion.

-          What is the estimated execution time for this?

-          Will these statistics even be correct when generated during a data ingestion?

Kind regards,

André Meyer
Application Team - Integration engineer
_________________

Publications Office of the European Union
Unit A4 - Infrastructure and IT Security Systems

Halian S.à.r.l.  (under contract with the Publications Office)
• (+352) 2929-42442<tel:%28%2B352%29%202929-42442>
•  andre.meyer at ext.publications.europa.eu<mailto:andre.meyer at ext.publications.europa.eu>

From: ZAJAC Agnieszka (OP)
Sent: 10 October, 2013 12:10 PM
To: MEYER André (OP-EXT)
Cc: SABETE Vafa (OP); HOHN Norbert (OP); PASTOR CAMARASA José Juan (OP)
Subject: FW: Dataset usage statistics


Dear Andre,

Could you please generate statistics on dataset usage following the instructions from Bert below? On 00.09 please.

Thank you in advance.

Best regards,
Agnieszka


From: Bert Van Nuffelen [mailto:bert.van.nuffelen at tenforce.com]
Sent: Thursday, October 10, 2013 10:09 AM

To: ZAJAC Agnieszka (OP)
Cc: jurgen vannerom (jurgen.vannerom at tenforce.com<mailto:jurgen.vannerom at tenforce.com>); HOHN Norbert (OP); SABETE Vafa (OP); PASTOR CAMARASA José Juan (OP)
Subject: Re: Dataset usage statistics

Hi Agnieszka,
indeed. This has not changed.
Bert

2013/10/10 ZAJAC Agnieszka (OP) <Agnieszka.ZAJAC at publications.europa.eu<mailto:Agnieszka.ZAJAC at publications.europa.eu>>
Hi,

Thanks a lot for quick reply. Can it be applied for test on 00.09?

Agnieszka

From: Bert Van Nuffelen [mailto:bert.van.nuffelen at tenforce.com<mailto:bert.van.nuffelen at tenforce.com>]
Sent: Thursday, October 10, 2013 10:02 AM
To: ZAJAC Agnieszka (OP)
Cc: jurgen vannerom (jurgen.vannerom at tenforce.com<mailto:jurgen.vannerom at tenforce.com>); HOHN Norbert (OP); SABETE Vafa (OP); PASTOR CAMARASA José Juan (OP)
Subject: Re: Dataset usage statistics

Hi Agnieszka,
here is the extract from our internal updated version.
1.1   DataSet Usage Statistics

To export the tracking stats, run the following command from the ckan management scripts

$ ./ckan_user_stats.sh <absolute-path-file> <from_date>

The arguments are mandatory. The script creates a file, which must be specified as an absolute path to the file, in which the statistics are dumped as a CSV. The date has the form YYYY-MM-DD.

For instance:

./ckan_user_stats.sh /applications/ecodp/users/ecodp/ckan/stat.csv 2013-01-01

will create a file stat.csv at the given location containing the user views stats from the first of January 2013. The content of the file is now of the form:

dataset id,dataset name,publisher name,total views,recent views (last 2 weeks)

ac5ddfbf-4ae4-4829-a025-669c92dd12a2,V1OEYc8mJFRn3cOlnJYXA,publ,5,5

84d8fbfe-5a57-4272-bca7-3f0a650c8121,xYIpDCIE81YFxghHr0z8Dg,cnect,4,4

dff2b3b0-2bfd-4260-ac65-610128779b52,IIJlaEf0VU835UgBkMuTrg,sg,4,3

15e07204-c6c2-4219-84c4-c4f8d46a0efa,helloworkd,acp_amb,4,4

33826c74-c364-448d-a6f5-af85bd7d55dc,1st,,3,0

f0592305-5c89-47f6-a6cd-7f566a43a782,VfGQxcxVB8MAfEYpM6ihBA,cnect,3,3

6a3a8b74-b5bd-49ca-82d9-cb1d927fd344,Qj83cpCYT0MrAZIJILOQQ,sg,2,2

5adb9513-8022-42bb-8f0d-af486520cd89,c8dxAO9R4zLEiZz84AWpQ,,1,0

90acb89a-cd89-4de9-8f52-53f942af0d3f,test-try-hijack,,1,0

Note:

Internally the script executes a CKAN paster command. That command is able do be executed without a date, in which case the date is defaulted to 3 days prior to the current date.
kind regards,
Bert


2013/10/10 ZAJAC Agnieszka (OP) <Agnieszka.ZAJAC at publications.europa.eu<mailto:Agnieszka.ZAJAC at publications.europa.eu>>
Dear Bert,
I would like to see the current report that can be generated from CKAN on dataset usage. Before I ask Andre to do it could you please have a look at the instrucitons in the operational manual pasted below? They seem somehow not complete. Please let us know.
Thanks a lot in advance,
Agnieszka

1.  DataSet Usage Statistics
To export the tracking stats, run the following command from the ckan management scripts
$ ./ckan_user_stats.sh <directory> <from_date>
The directory argument is mandatory. In this directory the statistics are dumped as CSV files.
If a date is specified, then the data from the given date is aggregated into the export file. Otherwise, the default date is 3 days prior to the current date. The date has the form YYYY-MM-DD.
The content of the files is now of the form:





--
Bert Van Nuffelen

Semantic Technologies Software Architect at TenForce
www.tenforce.be<http://www.tenforce.be>

Bert.Van.Nuffelen at tenforce.com<mailto:Bert.Van.Nuffelen at tenforce.com>
Office: +32 (0)16 31 48 60<tel:%2B32%20%280%2916%2031%2048%2060>
Mobile:+32 479 06 24 26<tel:%2B32%20479%2006%2024%2026>
skype: bert.van.nuffelen



--
Bert Van Nuffelen

Semantic Technologies Software Architect at TenForce
www.tenforce.be<http://www.tenforce.be>

Bert.Van.Nuffelen at tenforce.com<mailto:Bert.Van.Nuffelen at tenforce.com>
Office: +32 (0)16 31 48 60<tel:%2B32%20%280%2916%2031%2048%2060>
Mobile:+32 479 06 24 26<tel:%2B32%20479%2006%2024%2026>
skype: bert.van.nuffelen



--
Bert Van Nuffelen

Semantic Technologies Software Architect at TenForce
www.tenforce.be<http://www.tenforce.be>

Bert.Van.Nuffelen at tenforce.com<mailto:Bert.Van.Nuffelen at tenforce.com>
Office: +32 (0)16 31 48 60<tel:%2B32%20%280%2916%2031%2048%2060>
Mobile:+32 479 06 24 26<tel:%2B32%20479%2006%2024%2026>
skype: bert.van.nuffelen

_______________________________________________
Ecodp-dev mailing list
Ecodp-dev at lists.okfn.org<mailto:Ecodp-dev at lists.okfn.org>
http://lists.okfn.org/mailman/listinfo/ecodp-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131015/30cb5bb1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Stats-CKAN-2013-10-15.xlsx
Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 432241 bytes
Desc: Stats-CKAN-2013-10-15.xlsx
URL: <https://lists.okfn.org/mailman/private/ecodp-dev/attachments/20131015/30cb5bb1/attachment.xlsx>


More information about the ecodp-dev mailing list