[okfn-labs] Calendar fields (date/time)
Michael Bauer
michael.bauer at okfn.org
Thu Feb 20 09:26:45 UTC 2014
Stefan,
On Wed, Feb 19, 2014 at 11:58:03PM +0100, Stefan Urbanek wrote:
> PROBLEM
>
> In the bond-yields-uk-10y dataset[1] there is:
>
> {
> "id": "Year",
> "type": "date",
> "format": "yyyy"
> },
>
> What is the problem? The field in the data file contains an integer. The metadata says it is “date”.
I've stumbled across this as well.
> DESCRIPTION
>
> From parsing and validating point of view, the field metadata as specified above are correct. We (humans) know that it is date (in fact, part of a date). We also might have a validator (of one type) for the format.
>
> From data processing it is not easy to handle this case:
>
> 1. there is no standard way of specifying not-specific date (at higher granularity than a day). There are many ways of doing it, but that's another topic.
>
> 2. the metadata says that it is a date, but it can not be converted to a valid system date datatype – neither in Python nor in SQL and I doubt about other environments as well. Reason is the point number 1.
>
> 3. what format is the format? ... but that's another issue, I'm not going into more depth here.
>
> From machine process-ability it is just an integer or a string. As '2012-04' is just a string, not april. `2012` is not `January 1st 2012`, `2012-04` is not `April 1st 2012`. There are datasets with mixed date granularity: month level and daily levels for example – all in the same dataset, with same fields. If we just converted every date-like string to a full date, we would not be able to distinguish "whole year 2012" from "April 2012" from "April 1st 2012".
Agree! I've ran into this a couple of times. In that case I'd probably not
even use a date type - but a simple integer for the year. and month and
have different columns for them.
> Example practice from data warehouses: the quite common way of modelling date dimension is to use integer to specify a date: `YYYYMMDD`. If the `MMDD` part is 0000 as in “20140000" then the value corresponds either to whole year or unspecified part of the year (we can’t tell without more information). Similar for `DD` == `00`. The values for "unspecified" might also be `9999` instead of `0000` for sorting purposes, as in “20149999”.
Michael
--
Data Diva | skype: mihi_tr | @mihi_tr
The Open Knowledge Foundation | School of Data
http://okfn.org | http://schoolofdata.org
GPG/PGP key: http://tentacleriot.eu/mihi.asc
More information about the okfn-labs
mailing list