Overview

These are notes on how to connect Django with AWS OpenSearch. The following article was helpful.

However, since the above article targets Elasticsearch, changes corresponding to OpenSearch were needed.

Changes

Changes for OpenSearch were needed starting from the Elasticsearch Setup section of the article.

Specifically, the following two libraries were required.

(env)$ pip install opensearch-py
(env)$ pip install django-opensearch-dsl

After that, by replacing django_elasticsearch_dsl with django-opensearch-dsl and elasticsearch_dsl with opensearchpy, I was able to proceed as described in the article.

For example, it looks like this:

# blog/documents.py

from django.contrib.auth.models import User
from django_opensearch_dsl import Document, fields # Changed to opensearch
from django_opensearch_dsl.registries import registry # Changed to opensearch

from blog.models import Category, Article


@registry.register_document
class UserDocument(Document):
    class Index:
        name = 'users'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
        }

    class Django:
        model = User
        fields = [
            'id',
            'first_name',
            'last_name',
            'username',
        ]


@registry.register_document
class CategoryDocument(Document):
    id = fields.IntegerField()

    class Index:
        name = 'categories'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
        }

    class Django:
        model = Category
        fields = [
            'name',
            'description',
        ]


@registry.register_document
class ArticleDocument(Document):
    author = fields.ObjectField(properties={
        'id': fields.IntegerField(),
        'first_name': fields.TextField(),
        'last_name': fields.TextField(),
        'username': fields.TextField(),
    })
    categories = fields.ObjectField(properties={
        'id': fields.IntegerField(),
        'name': fields.TextField(),
        'description': fields.TextField(),
    })
    type = fields.TextField(attr='type_to_string')

    class Index:
        name = 'articles'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
        }

    class Django:
        model = Article
        fields = [
            'title',
            'content',
            'created_datetime',
            'updated_datetime',
        ]

Populate Elasticsearch

The above article targeting Elasticsearch introduces the following command.

python manage.py search_index --rebuild

On the other hand, for OpenSearch, the following commands were needed.

Creating Indices

python manage.py opensearch index create
The following indices will be created:
        - users.
        - categories.
        - articles.

Continue ? [y]es [n]o : y

Creating index 'users'... OK
Creating index 'categories'... OK
Creating index 'articles'... OK

Registering Documents

python3 manage.py opensearch document index
The following documents will be indexed:
        - 5 User.
        - 3 Category.
        - 5 Article.

Continue ? [y]es [n]o : y

Indexing 5 User: OK
Indexing 3 Category: OK
Indexing 5 Article: OK

5 User successfully indexed, 0 errors:
3 Category successfully indexed, 0 errors:
5 Article successfully indexed, 0 errors:

Rebuilding Indices

python manage.py opensearch index rebuild

Additional: Adding Analyzers and Fields

Try the Field Classes described in the following section.

In the following example, an html_strip analyzer and a Keyword field are set for username.

# blog/documents.py

from django.contrib.auth.models import User
from django_opensearch_dsl import Document, fields
from django_opensearch_dsl.registries import registry

from blog.models import Category, Article

from opensearchpy import analyzer, tokenizer

html_strip = analyzer(
    'html_strip',
    tokenizer="standard",
    filter=["lowercase", "stop", "snowball"],
    char_filter=["html_strip"]
)

@registry.register_document
class UserDocument(Document):

    username = fields.TextField(
        analyzer=html_strip,
        fields={'raw': fields.KeywordField()}
    )

    class Index:
        name = 'users'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
        }

    class Django:
        model = User
        fields = [
            'id',
            'first_name',
            'last_name',
            # 'username',
        ]

As a result of the above, the following mapping was registered in OpenSearch.

{
  "users" : {
    "mappings" : {
      "properties" : {
        "first_name" : {
          "type" : "text"
        },
        "id" : {
          "type" : "integer"
        },
        "last_name" : {
          "type" : "text"
        },
        "username" : {
          "type" : "text",
          "fields" : {
            "raw" : {
              "type" : "keyword"
            }
          },
          "analyzer" : "html_strip"
        }
      }
    }
  }

By using username.raw, sorting and aggregation become possible. Below is an example of views. Adding - appears to sort in descending order.

# search/views.py

import abc

from django.http import HttpResponse
from opensearchpy import Q

from rest_framework.pagination import LimitOffsetPagination
from rest_framework.views import APIView

from blog.documents import ArticleDocument, UserDocument, CategoryDocument
from blog.serializers import ArticleSerializer, UserSerializer, CategorySerializer

class PaginatedOpenSearchAPIView(APIView, LimitOffsetPagination):
    serializer_class = None
    document_class = None

    @abc.abstractmethod
    def generate_q_expression(self, query):
        """This method should be overridden
        and return a Q() expression."""

    def get(self, request, query):
        try:
            q = self.generate_q_expression(query)
            search = self.document_class.search().query(q).sort(
                "-username.raw"
            )
            response = search.execute()

            print(
                f'*** Found {response.hits.total.value} hit(s) for query: "{query}"')

            results = self.paginate_queryset(response, request, view=self)
            serializer = self.serializer_class(results, many=True)
            return self.get_paginated_response(serializer.data)
        except Exception as e:
            print(e)
            return HttpResponse(e, status=500)

Summary

I hope this serves as a useful reference for connecting Django with AWS OpenSearch.