blog, Couchbase

Couchbase Date Functions

Today in my tests with Couchbase I had the opportunity to play with dates.  For the sample database we pushed 1 stores worth of sold data out of our CRM to the Couchbase bucket names ‘sales’.

This sample sales data set had just under 13,000 records.  The deal date was not time stamped so we formatted it to the standard date format yyyy-mm-dd.  In doing this we are able to use the built in date functions for N1QL (the Couchbase SQL language for json document).

After the data was imported into the bucket we setup a primary index on the bucket and ran the following query:

  • extract the year from the date
  • count the number of deals (or sales) for each year
  • sort descending from year with highest number of sales

SELECT DATE_PART_STR(dealDate, ‘year’) as `y`, count(*) AS `amt`
FROM sales
GROUP BY DATE_PART_STR(dealDate, ‘year’)
ORDER BY `amt` DESC

This query took ~2 seconds

Being new to Couchbase I’m still trying to figure out the best way to index the data to maximum query speed.  With a data set of 13,000 records a 2 second query is a long time.  I can’t imagine what it will be like with a few hundred thousand.

So indexing to the rescue!  The following index was created:
CREATE INDEX `idx_dealdate` ON `sales`(`dealDate`)

We then re-ran the query again and . . . . ~2 seconds. Dang

My first assumption would be to create an index on the year the same way that I would query the year from the day with the following:

CREATE INDEX `idx_deal_year` ON sales(DATE_PART_STR(dealDate, ‘year’))

This statement ran and created an index.  However it did not speed up the query.

So what if I run a query using substr instead of the date function?

SELECT SUBSTR(dealDate, 0, 4) AS `y`, count(*) AS `amt`
FROM sales
GROUP BY SUBSTR(dealDate, 0, 4)
ORDER BY `amt` DESC

This takes the same amount of time to run ~2 seconds.

Let’s try a few other queries on the date field:

SELECT *
FROM sales
WHERE dealDate LIKE ‘2018-02%’

or even:

SELECT *
FROM sales
WHERE CONTAINS(dealDate, ‘2018-02’)

Every query takes ~2 seconds.  The last using the CONTAINS function consistently executes a slight bit faster.

It finally occurred to me that I had this issue before.  For some reason the query is not using the idx_dealdate.  This is because I was not filtering the data based on the index (CREATE INDEX `idx_dealdate` ON `sales`(`dealDate`)).  So finally running the following:

SELECT DATE_PART_STR(dealDate, ‘year’) as `y`, count(*) AS `amt`
FROM sales
WHERE dealDate IS NOT NULL
GROUP BY DATE_PART_STR(dealDate, ‘year’)
ORDER BY `amt` DESC

executed in ~500ms – that’s a 75% speed savings!

The WHERE clause caused Couchbase to use the index and greatly increased the speed of the query.

Leave a Reply