blog, Couchbase

Speed Up Couchbase Select Query

I fell in love with Couchbase a few months ago and we have been working to deploy it where I work on a major basis.  This NoSQL database is feature rich and crazy fast.

There was a query that was running a little slow in a mobile app that I deployed.  The query, which is done on the server side not in the phone, was taking just under a second and it appeared to be slow in the phone.  I wasn’t sure why the query was taking this long and attempted to speed it up.

Now a little bit of necessary info:   The database has over 2.5 million documents (records) in the customer bucket (table). The server is dedicated to Couchbase. It has 8 cores, 16 GB of RAM, SSD drives and running the community edition of Couchbase.  The bucket has an index for the customer information that is being queried.  The json documents can vary in size based on how much information we have on the customer.  All queries need to relate to the store that the end user is logged into the mobile app with. The portion of the json document that is being delivered back is between 3.5k and 5k.

A one second delay doesn’t sound like much but in the phone it seems to be a long time.

This is a sample of the data that gets output (it is just a portion of the json document):

[
  {
    "customer": {
      "assignedTo": {
        "bdcperson": [],
        "fimanager": [],
        "salesmanager": [
          {
            "id": null,
            "key": "0",
            "name": {
              "first": null,
              "full": " ",
              "last": null,
              "middle": null,
              "prefix": null,
              "suffix": null
            },
            "priority": "1"
          }
        ],
        "salesperson": [
          {
            "id": null,
            "key": "6401",
            "name": {
              "first": null,
              "full": "David Defoe",
              "last": null,
              "middle": null,
              "prefix": null,
              "suffix": null
            },
            "priority": "1"
          }
        ],
        "serviceadvisor": [
          {
            "id": null,
            "key": "0",
            "name": {
              "first": null,
              "full": " ",
              "last": null,
              "middle": null,
              "prefix": null,
              "suffix": null
            },
            "priority": "1"
          }
        ]
      },
      "communications": {
        "addresses": {
          "primary": {
            "city": "",
            "county": "",
            "state": null,
            "stateAbbreviation": "",
            "street1": "",
            "street2": "",
            "zip": ""
          }
        },
        "emails": {
          "primary": "taskmaster@marvel.com",
          "secondary": "die@test.com"
        },
        "phones": {
          "cell": "",
          "fax": null,
          "home": "",
          "primary": "2125551212",
          "secondary": "",
          "work": "2125551212"
        }
      },
      "doNotContact": {
        "bulkemail": null,
        "email": "NO",
        "fax": "NO",
        "letter": "NO",
        "phone": "NO",
        "spamtext": null,
        "text": "YES"
      },
      "dob": null,
      "driversLicense": {
        "expirationDate": null,
        "number": null,
        "stateAbbreviation": null
      },
      "id": "",
      "key": "6901",
      "name": {
        "first": "Tony",
        "full": "Tony Masters",
        "last": "Masters",
        "middle": "",
        "prefix": "",
        "suffix": ""
      },
      "status": "Active",
      "type": "Individual"
    }
  }
]

In my N1QL query (Couchbase json query language) I’m specifying the unique store number and the unique customer number:

SELECT entities.customer
FROM `customers`
WHERE entities.customer.`key` = '6901'
  AND entities.store.`id` = '15212'

This query was averaging around 800ms to complete.

There is an index on this customer object along with the store number:

idx_customers on ["((`entities`.`store`).`id`)","(`entities`.`customer`)"]

there is also an index on just the store number and the customer number:

idx_cust_store_key on ["((`entities`.`customer`).`key`)","((`entities`.`store`).`id`)"]

Couchbase has this great feature in their GUI that allows you to explain the query. When I look at the “explain” the query is using the idx_customers index. Because I’m specifically targeting only the store and customer id’s it seems that the query should have used the idx_cust_store_key. Plus Couchbase says that order in the WHERE clause matters when it determines which index to use.

At this point I had to try to force the query to use the index I wanted. So I added a USE INDEX clause to the N1QL statement:

SELECT entities.customer
FROM `customers`
USE INDEX (idx_cust_cpin_key USING GSI)
WHERE entities.customer.`key` = '6901'
  AND entities.store.`id` = '15212'

This time when I looked at the “explain” it was using the index I wanted and the result took 6.92ms. This was a HUGE improvement.

Just to compare the speed I decided to run a test and query the document ID from the database directly and see what the speed difference was for running the properly indexed query vs a direct document get.

SELECT entities.customer FROM `customers` USE KEYS 'cust::6583C53C160953D9F8476200D8C62C735B9B2931'

The result of the direct document get was 10.53ms. It was faster to use the index!

Leave a Reply