[okfn-labs] Calendar fields (date/time)

Stefan Urbanek stefan.urbanek at gmail.com
Sun Feb 23 09:39:55 UTC 2014


On 22 Feb 2014, at 23:07, Matthew Fullerton <matt.fullerton at gmail.com> wrote:

> A couple of points/questions,
> 
> On 21 February 2014 17:32, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> [snip]
>> 
>> I like this proposal a lot but there are certain questions I have:
>> 
>> - What do data packagers do with time series data they get that is just a
>> year (e.g. as in your example of bond yields). Personally i've found myself
>> more and more just converting these to actual dates e.g. 2004 => 2004-01-01,
>> 2004 Q3 => 2004-10-01 etc. This means you get a real date but at the cost of
>> adding some precision that wasn't really there.
> 
> I also follow this approach. For a cartodb-based project recently; I
> wanted real dates but also only had years for *some* entries. My
> simple approach had a "year only" boolean column, if true the
> visualization only showed the year part of the date.... not ideal
> 

Not that uncommon.

Now, how the generic visualisation tools are going to handle this? They are not. But that’s perfectly fine.

Your example just proves my points, that:

1. there are many ways of date/time/calendar modelling
2. we can’t handle all of them directly without additional effort, despite we would love to

> [snip]
>> 
>> So, to summarize: I agree with making the change you propose re types - are
>> you happy to open an appropriate issue re JSON Table Schema on
>> https://github.com/dataprotocols/dataprotocols/issues
>> 
>>> 
>>> Information that a field contains a year or any other combination of
>>> calendar units should be in some kind of analytical metadata. We don't have
>>> to define them yet, as they might be very case-specific.
>> 
>> 
>> I think string plus pattern/format attribute should be ok for this. I also
>> note we have a units spec http://dataprotocols.org/units/ (not much used yet
>> and not formally integrated into JSON Table Schema but it could be)
> 
> Just to check I understand, Rufus, your proposal is to always require
> a full (date)(time) in a full format (agreeing with Stefan) but then
> to use a format specifier to specify which elements are actually data
> that should be used?
> 
> One other possibility would be to consider adding year, month and
> yearmonth types. It's probably too much, but on the other hand only an
> extension of the concept of date, time and datetime. I suspect there
> are many open data sets where year meaning "the whole year" is used.
> 

Please no. Year is just an integer, from low level processing perspective it is nothing more, nothing less, plain integer. That it is “year” as a calendar unit is our knowledge at higher level. We are not even mentioning distinction between calendar and fiscal year here. As for “yearmonth” – is it going to be “integer” or a “string”? If string then what format? This information belongs to another kind of metadata.

We can’t avoid one layer of ETL that will either convert the column, additional converted column or add a column with additional information (as you have done in the example above).

Here is one of possible solutions in a form of a use-case:

My tool will understand a field to be “year” if the field’s “unit” will be “year” and will interpret that as “January 1st of that year”. My tool will understand a field to be “month of a year” when the field’s unit will be “yearmonth”, the field’s data type will be “string” and the format will be “YYYY-MM". The result will be 1st of that month. Otherwise the data will be considered erroneous. That’s my tool. You conform to my standard to have your data displayed correctly, or your data will not display at all.

I do not dare to try to standardise any of this at the moment. We don’t know yet what is out there, in the “raw datas of the world”. I guess our problem is patience. We finally have open raw data and we would like to do all the fancy analysis on top of them. Directly.

The only thing we can do, is to prevent issues in the future.

Stefan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20140223/481724cf/attachment-0004.html>


More information about the okfn-labs mailing list