[ckan-dev] dataset urls

Tue Apr 24 22:57:18 UTC 2012

On 24 April 2012 23:26, David Raznick <kindly at gmail.com> wrote:
>
>
> On Tue, Apr 24, 2012 at 7:52 PM, Rufus Pollock <rufus.pollock at okfn.org>
> wrote:
>>
>> On 24 April 2012 15:32, David Raznick <kindly at gmail.com> wrote:
>> >
>> >
>> > On Fri, Apr 20, 2012 at 1:50 PM, Rufus Pollock <rufus.pollock at okfn.org>
>> >
>> >
>> > My take on this is that we should just have a slightly less ugly
>> > permanent
>> > url based off the uuid.  This would look like something base64 encoded
>> > and
>> > have a one to one mapping with the uuid.  I have not worked out the
>> > mapping
>> > yet but as uuids are in hex you could get them shorter and without the
>> > hyphens. So the url would be /dataset/~afdXZz34rfdsafewrA.
>>
>> At best you can get from 36 characters to ~20:
>
>
> I think we can actually make it much shorter by only taking part of the
> uuid.  We could very simply check for collision at dataset create time.
>
> Also Toby had the idea of allowing /dataset/~afdXfds/the-name-or-whatever
>
> The name part would be optional and meaningless but at least you can see the
> name.
>
>>
>> <https://github.com/okfn/datautil/blob/master/datautil/id.py#L4>
>>
>> It's still terrible from a UX point of view [1].
>
>
> I think its not a terrible as broken links.  It pretty bad UX to make name

We are talking about comment links atm. Re name change you get broken
links on github if you rename your repo. We could also implement a
simple redirector by looking up in the dataset revision table for old
names :-)

> changing *HARD* because the last thing we want to do is confront users with
> with more choices then necessary.  They should not be forced to think

Why shouldn't it be like github repos. You can change but you are
warned about problems. Pick a good name.

> through the consequences of their actions and read some blurb as why its bad
> if we can make it avoidable.

Understood. That said I frequently type in the names of familiar
datasets (but i may be unusual). That's never possible once we have
somewhat random id in there. But that's then a question of usage. I
think DataHub at least is more like GitHub (or Twitter) in that
regard: I care about this entities name a lot (compared to say
StackOverflow where I always arrive via google or similar).

>> I certainly agree
>> disqus identifier should run off pure id but don't think disqus_url
>> should as I have said :-)
>>
>> rufus
>>
>> [1]: http://trac.ckan.org/ticket/2321
>>
>>
>> > The other option in to readd an auto id column to the dataset table and
>> > use
>> > that.  I still think we should keep the uuid though for harvesting
>> > purposes.
>>
>> Looking back I wouldn't use uuid's again unless we had to for some
>> reason. I would certainly argue that related stuff and anything else
>> should use auto-increment if at all possible.
>>
> Personally I like uuids for join keys as they mean that you have to do less
> round trips to the database when using them properly.  I think there is
> nothing wrong with also having an auto id column for nicer urls.  We could
> add them to resources too.

But auto-ids don't make that much nicer ids so it seems like a lot of
work for a very small win.

To summarize: I really like meaningful human readable names in the url
as the primary identifier. However, i may be unusual and we could,
say, ask on ckan-discuss and twitter about people's preferences.

At the same time I would have no objection to, say, shortening
resource uuids ... (but what is cost/benefit?)

Rufus