[ckan-changes] [okfn/ckan] 3736f6: [doc/using-data-api][m]: detailed documentation an...
GitHub
noreply at github.com
Sat Apr 21 02:10:16 UTC 2012
Branch: refs/heads/master
Home: https://github.com/okfn/ckan
Commit: 3736f616ce20485b090ffb73ae1faee36631386f
https://github.com/okfn/ckan/commit/3736f616ce20485b090ffb73ae1faee36631386f
Author: Rufus Pollock <rufus.pollock at okfn.org>
Date: 2012-04-20 (Fri, 20 Apr 2012)
Changed paths:
M doc/datastore.rst
M doc/index.rst
A doc/using-data-api.rst
Log Message:
-----------
[doc/using-data-api][m]: detailed documentation and tutorial for the data API (in essence a tutorial for ElasticSearch).
diff --git a/doc/datastore.rst b/doc/datastore.rst
index 395de84..15fcd8f 100644
--- a/doc/datastore.rst
+++ b/doc/datastore.rst
@@ -30,30 +30,21 @@ the spreadsheet data is stored in the DataStore one would be able to access
individual spreadsheet rows via a simple web-api as well as being able to make
queries over the spreadsheet contents.
-Using the DataStore Data API
-============================
+The DataStore Data API
+======================
The DataStore's Data API, which derives from the underlying ElasticSearch
data-table, is RESTful and JSON-based with extensive query capabilities.
-Each resource in a CKAN instance has an associated DataStore 'database'. This
-database will be accessible via a web interface at::
+Each resource in a CKAN instance has an associated DataStore 'table'. This
+table will be accessible via a web interface at::
/api/data/{resource-id}
This interface to this data is *exactly* the same as that provided by
ElasticSearch to documents of a specific type in one of its indices.
-So, for example, to see the fields in this database do::
-
- /api/data/{resource-id}/_mapping
-
-To do simple search do::
-
- /api/data/{resource-id}/_search?q=abc
-
-For more on searching see: http://www.elasticsearch.org/guide/reference/api/search/uri-request.html
-
+For a detailed tutorial on using this API see :doc:`using-data-api`.
Installation and Configuration
==============================
diff --git a/doc/index.rst b/doc/index.rst
index ed011a7..67e1153 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -60,6 +60,8 @@ The CKAN API
:maxdepth: 3
api
+ api-tutorial
+ using-data-api
General Administration
======================
diff --git a/doc/using-data-api.rst b/doc/using-data-api.rst
new file mode 100644
index 0000000..09c26b1
--- /dev/null
+++ b/doc/using-data-api.rst
@@ -0,0 +1,403 @@
+==================
+Using the Data API
+==================
+
+Introduction
+============
+
+The Data API builds directly on ElasticSearch, with a resource API endpoint
+being equivalent to a single index 'type' in ElasticSearch (we tend to refer to
+it as a 'table'). This means you can often directly re-use `ElasticSearch
+client libraries`_ when connecting to the API endpoint.
+
+Furthermore, it means that almost all of what is presented below is generally
+applicable to ElasticSearch.
+
+.. _ElasticSearch client libraries: http://www.elasticsearch.org/guide/appendix/clients.html
+
+Quickstart
+==========
+
+``endpoint`` refers to the data API endpoint (or ElasticSearch index / table).
+
+Key urls:
+
+* Query: ``{endpoint}/_search`` (in ElasticSearch < 0.19 this will return an
+ error if visited without a query parameter)
+
+ * Query example: ``{endpoint}/_search?size=5&pretty=true``
+
+* Schema (Mapping): ``{endpoint}/_mapping``
+
+Examples
+--------
+
+cURL (or Browser)
+~~~~~~~~~~~~~~~~~
+
+The following examples utilize the <a href="http://curl.haxx.se/">cURL</a>
+command line utility. If you prefer, you you can just open the relevant urls in
+your browser::
+
+ // added pretty=true to get the json results pretty printed
+ curl {endpoint}/_search?q=title:jones&size=5&pretty=true</pre>
+
+Javascript
+~~~~~~~~~~~
+
+A simple ajax (JSONP) request to the data API using jQuery::
+
+ var data = {
+ size: 5 // get 5 results
+ q: 'title:jones' // query on the title field for 'jones'
+ };
+ $.ajax({
+ url: {endpoint}/_search,
+ dataType: 'jsonp',
+ success: function(data) {
+ alert('Total results found: ' + data.hits.total)
+ }
+ });
+
+
+Querying
+========
+
+Basic Queries Using Only the Query String
+-----------------------------------------
+
+Basic queries can be done using only query string parameters in the URL. For
+example, the following searches for text 'hello' in any field in any document
+and returns at most 5 results::
+
+ {endpoint}/_search?q=hello&size=5
+
+Basic queries like this have the advantage that they only involve accessing a
+URL and thus, for example, can be performed just using any web browser.
+However, this method is limited and does not give you access to most of the
+more powerful query features.
+
+Basic queries use the `q` query string parameter which supports the `Lucene
+query parser syntax`_ and hence filters on specific fields (e.g. `fieldname:value`), wildcards (e.g. `abc*`) and more.
+
+.. _Lucene query parser syntax: http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html
+
+There are a variety of other options (e.g. size, from etc) that you can also
+specify to customize the query and its results. Full details can be found in
+the `ElasticSearch URI request docs`_.
+
+.. _ElasticSearch URI request docs: http://www.elasticsearch.org/guide/reference/api/search/uri-request.html
+
+Full Query API
+--------------
+
+More powerful and complex queries, including those that involve faceting and
+statistical operations, should use the full ElasticSearch query language and API.
+
+In the query language queries are written as a JSON structure and is then sent
+to the query endpoint (details of the query langague below). There are two
+options for how a query is sent to the search endpoint:
+
+1. Either as the value of a source query parameter e.g.::
+
+ {endpoint}/_search?source={Query-as-JSON}
+
+2. Or in the request body, e.g.::
+
+ curl -XGET {endpoint}/_search -d 'Query-as-JSON'
+
+ For example::
+
+ curl -XGET {endpoint}/_search -d '{
+ "query" : {
+ "term" : { "user": "kimchy" }
+ }
+ }'
+
+
+Query Language
+==============
+
+Queries are JSON objects with the following structure (each of the main
+sections has more detail below)::
+
+ {
+ size: # number of results to return (defaults to 10)
+ from: # offset into results (defaults to 0)
+ fields: # list of document fields that should be returned - http://elasticsearch.org/guide/reference/api/search/fields.html
+ sort: # define sort order - see http://elasticsearch.org/guide/reference/api/search/sort.html
+
+ query: {
+ # "query" object following the Query DSL: http://elasticsearch.org/guide/reference/query-dsl/
+ # details below
+ },
+
+ facets: {
+ # facets specifications
+ # Facets provide summary information about a particular field or fields in the data
+ }
+
+ # special case for situations where you want to apply filter/query to results but *not* to facets
+ filter: {
+ # filter objects
+ # a filter is a simple "filter" (query) on a specific field.
+ # Simple means e.g. checking against a specific value or range of values
+ },
+ }
+
+Query results look like::
+
+ {
+ # some info about the query (which shards it used, how long it took etc)
+ ...
+ # the results
+ hits: {
+ total: # total number of matching documents
+ hits: [
+ # list of "hits" returned
+ {
+ _id: # id of document
+ score: # the search index score
+ _source: {
+ # document 'source' (i.e. the original JSON document you sent to the index
+ }
+ }
+ ]
+ }
+ # facets if these were requested
+ facets: {
+ ...
+ }
+ }
+
+Query DSL: Overview
+-------------------
+
+Query objects are built up of sub-components. These sub-components are either
+basic or compound. Compound sub-components may contains other sub-components
+while basic may not. Example::
+
+ {
+ "query": {
+ # compound component
+ "bool": {
+ # compound component
+ "must": {
+ # basic component
+ "term": {
+ "user": "jones"
+ }
+ }
+ # compound component
+ "must_not": {
+ # basic component
+ "range" : {
+ "age" : {
+ "from" : 10,
+ "to" : 20
+ }
+ }
+ }
+ }
+ }
+ }
+
+In addition, and somewhat confusingly, ElasticSearch distinguishes between
+sub-components that are "queries" and those that are "filters". Filters, are
+really special kind of queries that are: mostly basic (though boolean
+compounding is alllowed); limited to one field or operation and which, as such,
+are especially performant.
+
+Examples, of filters are (full list on RHS at the bottom of the query-dsl_ page):
+
+ * term: filter on a value for a field
+ * range: filter for a field having a range of values (>=, <= etc)
+ * geo_bbox: geo bounding box
+ * geo_distance: geo distance
+
+.. _query-dsl: http://elasticsearch.org/guide/reference/query-dsl/
+
+Rather than attempting to set out all the constraints and options of the
+query-dsl we now offer a variety of examples.
+
+Examples
+--------
+
+Match all / Find Everything
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ {
+ "query": {
+ "match_all": {}
+ }
+ }
+
+Classic Search-Box Style Full-Text Query
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This will perform a full-text style query across all fields. The query string
+supports the `Lucene query parser syntax`_ and hence filters on specific fields
+(e.g. `fieldname:value`), wildcards (e.g. `abc*`) as well as a variety of
+options. For full details see the query-string_ documentation.
+
+::
+
+ {
+ "query": {
+ "query_string": {
+ "query": {query string}
+ }
+ }
+ }
+
+.. _query-string: http://elasticsearch.org/guide/reference/query-dsl/query-string-query.html
+
+Filter on One Field
+~~~~~~~~~~~~~~~~~~~
+
+::
+
+ {
+ "query": {
+ "term": {
+ {field-name}: {value}
+ }
+ }
+ }
+
+High performance equivalent using filters::
+
+ {
+ "query": {
+ "constant_score": {
+ "filter": {
+ "term": {
+ # note that value should be *lower-cased*
+ {field-name}: {value}
+ }
+ }
+ }
+ }
+
+Find all documents with value in a range
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This can be used both for text ranges (e.g. A to Z), numeric ranges (10-20) and
+for dates (ElasticSearch will converts dates to ISO 8601 format so you can
+search as 1900-01-01 to 1920-02-03).
+
+::
+
+ {
+ "query": {
+ "constant_score": {
+ "filter": {
+ "range": {
+ {field-name}: {
+ "from": {lower-value}
+ "to": {upper-value}
+ }
+ }
+ }
+ }
+ }
+ }
+
+For more details see `range filters`_.
+
+.. _range filters: http://elasticsearch.org/guide/reference/query-dsl/range-filter.html
+
+Full-Text Query plus Filter on a Field
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+ {
+ "query": {
+ "query_string": {
+ "query": {query string}
+ },
+ "term": {
+ {field}: {value}
+ }
+ }
+ }
+
+
+Filter on two fields
+~~~~~~~~~~~~~~~~~~~~
+
+Note that you cannot, unfortunately, have a simple and query by adding two
+filters inside the query element. Instead you need an 'and' clause in a filter
+(which in turn requires nesting in 'filtered'). You could also achieve the same
+result here using a `bool query`_.
+
+.. _bool query: http://elasticsearch.org/guide/reference/query-dsl/bool-query.html
+
+::
+
+ {
+ "query": {
+ "filtered": {
+ "query": {
+ "match_all": {}
+ },
+ "filter": {
+ "and": [
+ {
+ "range" : {
+ "b" : {
+ "from" : 4,
+ "to" : "8"
+ }
+ },
+ },
+ {
+ "term": {
+ "a": "john"
+ }
+ }
+ ]
+ }
+ }
+ }
+ }
+
+Facets
+------
+
+Facets provide a way to get summary information about then data in an
+elasticsearch table, for example counts of distinct values.
+
+TODO: complete
+
+
+Schema Mapping
+==============
+
+As the ElasticSearch documentation states:
+
+ Mapping is the process of defining how a document should be mapped to the
+ Search Engine, including its searchable characteristics such as which fields
+ are searchable and if/how they are tokenized. In ElasticSearch, an index may
+ store documents of different “mapping types”. ElasticSearch allows one to
+ associate multiple mapping definitions for each mapping type.
+
+ Explicit mapping is defined on an index/type level. By default, there isn't a
+ need to define an explicit mapping, since one is automatically created and
+ registered when a new type or new field is introduced (with no performance
+ overhead) and have sensible defaults. Only when the defaults need to be
+ overridden must a mapping definition be provided.
+
+Relevant docs: http://elasticsearch.org/guide/reference/mapping/.
+
+
+JSONP support
+=============
+
+JSONP support is available on any request via a simple callback query string parameter::
+
+ ?callback=my_callback_name
+
================================================================
More information about the ckan-changes
mailing list