[ckan-dev] CKAN showing up in Google searches - Language used

Stefan Oderbolz stefan.oderbolz at liip.ch
Mon Feb 2 22:40:08 UTC 2015


Hi there,

while you could use the robozs.txt "trick" to exclude URLs, its not a very
nice practice, because then the content is basically lost for Googel et al.

The prefered way to handle this is by using canonical links (see
https://support.google.com/webmasters/answer/139066?hl=en). This is a way
to tell a search engine your prefered URL for content and even specify that
each page exists in several languages.

Afaik this has not (yet) been implemented in CKAN. It would definitely be a
very welcome patch or extension.

Regards Stefan
On Feb 2, 2015 3:17 AM, "Alex (Maxious) Sadleir" <maxious at gmail.com> wrote:

> You can patch robots.txt in the CKAN source to exclude languages from
> search results (while users can still switch on the site unless you
> also disable languages in the config). I also exclude /_tracking
>
> diff --git a/ckan/public/robots.txt b/ckan/public/robots.txt
> index 279a33a..e410bdc 100644
> --- a/ckan/public/robots.txt
> +++ b/ckan/public/robots.txt
> @@ -3,6 +3,50 @@ Disallow: /dataset/rate/
>  Disallow: /revision/
>  Disallow: /dataset/*/history
>  Disallow: /api/
> +Disallow: /_tracking
> +Disallow: /_tracking
> +
> +Disallow: /ar/
> +Disallow: /bg/
> +Disallow: /ca
> +Disallow: /cs_CZ/
> +Disallow: /da_DK/
> +Disallow: /de/
> +Disallow: /dv/
> +Disallow: /el/
> +Disallow: /en_AU/
> +Disallow: /en_GB/
> +Disallow: /es/
> +Disallow: /es_AR/
> +Disallow: /fa_IR/
> +Disallow: /fi/
> +Disallow: /fr/
> +Disallow: /hu/
> +Disallow: /id/
> +Disallow: /is/
> +Disallow: /it/
> +Disallow: /ja/
> +Disallow: /km/
> +Disallow: /ko_KR/
> +Disallow: /lt/
> +Disallow: /lv/
> +Disallow: /my_MM/
> +Disallow: /nl/
> +Disallow: /no/
> +Disallow: /pl/
> +Disallow: /pt_BR/
> +Disallow: /ro/
> +Disallow: /ru/
> +Disallow: /sk/
> +Disallow: /sl/
> +Disallow: /sq/
> +Disallow: /sr/
> +Disallow: /sr_Latn/
> +Disallow: /sv/
> +Disallow: /tr/
> +Disallow: /uk_UA/
> +Disallow: /zh_CN/
> +Disallow: /zh_TW/
>
>  User-Agent: *
>  Crawl-Delay: 10
>
> On Mon, Feb 2, 2015 at 12:59 PM, Aaron McGlinchy
> <McGlinchyA at landcareresearch.co.nz> wrote:
> > Hi, our instance of CKAN now has datasets showing up in google searches,
> which is great.  However I have noticed that often the link which comes up
> in the google search takes the user to a 'non-default' language version of
> the dataset or resource.  Ie. Our language is English, but a search for
> example for:  house mouse data  returns as the number 1 result one of our
> resources, but with the language as Arabic.  This is perfectly fine if the
> user doing the search is wanting the Arabic language interface, but not
> quite so user friendly if the users wants the English interface.
> >
> > Is there anything that can be done to influence this behaviour (without
> removing language options that some other users might wish to use)?
> >
> > Thanks
> > Aaron
> >
> > ________________________________
> >
> > Please consider the environment before printing this email
> > Warning: This electronic message together with any attachments is
> confidential. If you receive it in error: (i) you must not read, use,
> disclose, copy or retain it; (ii) please contact the sender immediately by
> reply email and then delete the emails.
> > The views expressed in this email may not be those of Landcare Research
> New Zealand Limited. http://www.landcareresearch.co.nz
> > _______________________________________________
> > ckan-dev mailing list
> > ckan-dev at lists.okfn.org
> > https://lists.okfn.org/mailman/listinfo/ckan-dev
> > Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150202/dc61cacf/attachment-0003.html>


More information about the ckan-dev mailing list