[ckan-dev] Full-text search of PDF files in CKAN?

Andrew White WhiteA at landcareresearch.co.nz
Fri Sep 12 03:27:12 UTC 2014


Greetings from a new subscriber!

Our institution has begun using CKAN for data archiving, and it has been suggested that we could also use it as a document repository. Documents would mainly be PDF but other file types eg. .doc .txt might be included.

One of the features we would want in a document repository is the ability to search the full text of documents, including PDFs that include searchable text.

Has anyone implemented such a search in CKAN? What would be required - a new extension if none exists already? Presumably solr could provide the search if the text field could be indexed somehow. Perhaps a metadata field containing the text, but automatically populated by parsing the document on addition to CKAN?

Is there any limit to the size of the metadata fields, for indexing purposes?

Regards

Andrew White
Information Systems Support Specialist
Landcare Research New Zealand
PO Box 69040
Lincoln, Canterbury 7640
New Zealand

Phone: +64 3 321 9815
Fax: + 64 3 321 9998


________________________________

Please consider the environment before printing this email
Warning: This electronic message together with any attachments is confidential. If you receive it in error: (i) you must not read, use, disclose, copy or retain it; (ii) please contact the sender immediately by reply email and then delete the emails.
The views expressed in this email may not be those of Landcare Research New Zealand Limited. http://www.landcareresearch.co.nz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20140912/335782ba/attachment-0002.html>


More information about the ckan-dev mailing list