[ckan-dev] dataset urls

David Raznick kindly at gmail.com
Wed Apr 25 08:13:39 UTC 2012


On Wed, Apr 25, 2012 at 1:24 AM, Rufus Pollock <rufus.pollock at okfn.org>wrote:

> On 25 April 2012 01:07, David Raznick <kindly at gmail.com> wrote:
> >
> >
> >> We are talking about comment links atm.
> >>
> >> Re name change you get broken
> >> links on github if you rename your repo.
> >>
> >> We could also implement a
> >> simple redirector by looking up in the dataset revision table for old
> >> names :-)
> >
> >
> > We considered this but it would not work unless we were sure that all
> names
> > are unique forever.
>
> But why does this need to be perfect? If someone renames and then some
> other dataset replaces it fine - o/w this would work :-)
>

> I'm not sure we are debating same thing here. dataset uuid can be used
> for things that absolutely need to permanent forever (e.g. rdf uris,
> permanent identifiers for syncing). But for other stuff it's not the
> end of the world if something breaks (if that is rare and people are
> warned of risk)
>
> >>
> >> > changing *HARD* because the last thing we want to do is confront users
> >> > with
> >> > with more choices then necessary.  They should not be forced to think
> >>
> >> Why shouldn't it be like github repos. You can change but you are
> >> warned about problems. Pick a good name.
> >
> >
> > If we cared that much about the name we would not sluggify the title and
> > force people to make good ones.  Github forces you to do this.
>
> We used to do this. I have pushed several times for making dataset
> name sluggification better (remove article, warning people if long
> ...). Github btw now has something similar to what we do.
>
> >> through the consequences of their actions and read some blurb as why its
> >> bad
> >> if we can make it avoidable.
> >
> >> Understood. That said I frequently type in the names of familiar
> >> datasets (but i may be unusual). That's never possible once we have
> >> somewhat random id in there. But that's then a question of usage. I
> >> think DataHub at least is more like GitHub (or Twitter) in that
> >> regard: I care about this entities name a lot (compared to say
> >> StackOverflow where I always arrive via google or similar).
> >>
> > I think that the relevance of the name has much less consequence to us
> then
> > github but more then stackoverflow.  I am happy to keep the ability to
>
> OK, interesting. I don't see that way so much.
>
> > reference by name only in the url, but not give that out when
> systematically
> > creating a permanent links, like in this case.
>
> To repeat the disqus system will reference the permanent identifier
> and the disqus_url is, IRRC, a convenience (used from recent comments
> etc).
>
> > This can include things like activity streams, social stuff, apis to
> update
> > qa information, feeds, and anywhere we give out urls that we expect other
> > services to use. We could use just uuids for these but some would also
> > benefit from also being less ugly.
>
> There are two distinct discussions:
>
> * Do we want uuids exposed for datasets (by default) in the UI. I'm
> saying no, you're saying yes :-)
>

> * Do we want uuids exposed for datasets elsewhere (e.g. in activity
> streams, qa etc). Probable agreement ...
>  * I'm not quite sure what this means. Internally we ref the dataset
> object. Hence we can always change the url link at least in our system
> as this updates. For some things like RSS feeds given out to others
> this is more problematic (and i'd be happy with uuid/{friendly-name}
>
> * All agree: If we have to use uuids we can make them less ugly (e.g.
> by appending title). I'm concerned about shortening that risks
> collisions because you end up back where you started ...


I think there is a misunderstanding by what was meant by collisions.  They
should *never* happen at read time.  What I meant was when saving a dataset
make sure the first 9 hex values of a new dataset uuid are not in the
database already otherwise generate a new one (a collision).  This gives us
a space of about 68 billion and shortning is down to 6 characters.
 Collisions will happen very rarely approaching a billion datasets.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20120425/566e4d16/attachment-0001.html>


More information about the ckan-dev mailing list