Django + Elastic Search

Stumbled upon a blog about using Django and Elastic Search to find TED talks, I dive deep to understand in particular Elastic: what it is, how to run it, how to communicate with it.

There is an Elastic Stack, an ecosystem of different tools: Kibana, Logstash, Beats and Elasticsearch itself.

It allows you to send HTTP request, in our document searching, elasticsearch is used as shown below. It is not a SQL kind of search, more efficient, faster, communicates through RESTful API. In essence, Elasticsearch is NoSQL database. It stores data as JSON documents.

requests.post(NSS_URL,
data=json.dumps(post_data),
params=args

GET /tweets/doc/_search
{
“query”: {
“match”: {
“author”: “elon”
}
}
}
{
“took” : 5,
“timed_out” : false,
“_shards” : {
“total” : 5,
“successful” : 5,
“skipped” : 0,
“failed” : 0
},
“hits” : {
“total” : 3,
“max_score” : 0.2876821,
“hits” : [
{
“_index” : “tweets”,
“_type” : “doc”,
“_id” : “1”,
“_score” : 0.2876821,
“_source” : {
“author” : “Elon Musk”,
“text” : “This might be my finest work”,
“likes” : 43000
}
},
{
“_index” : “tweets”,
“_type” : “doc”,
“_id” : “2”,
“_score” : 0.18232156,
“_source” : {
“author” : “Elon Musk”,
“text” : “Thank you!”,
“likes” : 42000
}
},
{
“_index” : “tweets”,
“_type” : “doc”,
“_id” : “4”,
“_score” : 0.18232156,
“_source” : {
“author” : “Elon Musk”,
“text” : “@apirobotme your blog is the best blog about web development I have ever read. Thank you!”,
“likes” : 1000000
}
}
]
}
}

in the blog, the author tried to build a web app using Django for searching TED talks. ” The project will use PostgreSQL as a relational database, Elasticsearch and Django. The simplest way to set up everything is to use Docker. We have already used Docker previously when we run Elasticsearch in a container. But now we will have 3 containers. One for PostgreSQL, one for Elasticsearch and one for Django web application. “

What strikes me odd is that he still had to build a relational database to apply Elastic Search – doesn’t’ it defeats ElasticSearch’s own purpose or merits?

Inserting data into relational database, here is the models.py file in Django:

# talks/models.py
from django.db import models
class Talk(models.Model):
    name = models.CharField(max_length=200)
    description = models.TextField()
    speaker = models.CharField(max_length=200)
    url = models.URLField()
    number_of_views = models.PositiveIntegerField()
    transcript = models.TextField()
    def __str__(self):
        return self.name

After downloading TED talks csv data from kaggle, then define an Elasticsearch index and TalkDocument class. This class basically connects our relational database with Elasticsearch.

# talks/documents.py
from django_elasticsearch_dsl import DocType, Index
from .models import Talk
talks = Index('talks')
talks.settings(number_of_shards=1, number_of_replicas=0)
@talks.doc_type
class TalkDocument(DocType):
    class Meta:
        # The model associated with Elasticsearch document
        model = Talk
        # The fields of the model you want to be indexed
        # in Elasticsearch
        fields = (
            'name',
            'description',
            'speaker',
            'number_of_views',
            'transcript',
        )

Now create a function that searches for relevant talks:

# talks/search.py
from elasticsearch_dsl.query import Q, MultiMatch, SF
from .documents import TalkDocument
def get_search_query(phrase):
    query = Q(
        'function_score',
        query=MultiMatch(
            fields=['name', 'description', 'speaker', 'transcript'],
            query=phrase
        ),
        functions=[
            SF('field_value_factor', field='number_of_views')
        ]
    )
    return TalkDocument.search().query(query)
def search(phrase):
    return get_search_query(phrase).to_queryset()

Next, create a simple API using Django REST Framework.

this is the serializer.py

# talks/api/serializers.py
from rest_framework import serializers
from ..models import Talk
class TalkSerializer(serializers.ModelSerializer):
    class Meta:
        model = Talk
        fields = (
            'name',
            'description',
            'speaker',
            'url',
            'number_of_views',
            'transcript',
        )

this is the view.py file

# talks/api/views.py
from rest_framework import generics
from ..models import Talk
from ..search import search
from .serializers import TalkSerializer
class TalkList(generics.ListAPIView):
    queryset = Talk.objects.all()
    serializer_class = TalkSerializer
    def get_queryset(self):
        q = self.request.query_params.get('q')
        if q is not None:
            return search(q)
        return super().get_queryset()

Lastly a nice template file

<!-- talks/templates/talks.talk_list.html -->
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Talk List</title>
    <link rel="stylesheet" href="<https://cdn.jsdelivr.net/npm/semantic-ui@2.4.2/dist/semantic.min.css>">
</head>
<body>
    <div id="app">
        <div class="ui placeholder segment">
            <div class="ui input focus">
                <input
                    v-model="query"
                    type="text"
                    placeholder="Search for talks..."
                />
            </div>
        </div>
        <div class="ui three column stackable grid container">
            <div v-for="talk in talks" class="column">
                <a class="ui card" :href="talk.url">
                    <div class="content">
                        <div class="header">[[ talk.name ]]</div>
                        <div class="meta">[[ talk.speaker ]]</div>
                        <div class="description">[[ talk.description ]]</div>
                    </div>
                    <div class="extra content">
                        <i class="check icon"></i>
                        [[ talk.number_of_views ]] Views
                    </div>
                </a>
            </div>
        </div>
    </div>
    <script src="<https://unpkg.com/vue>"></script>
    <script src="<https://unpkg.com/lodash>"></script>
    <script src="<https://unpkg.com/axios/dist/axios.min.js>"></script>
    <script src="<https://cdn.jsdelivr.net/npm/semantic-ui@2.4.2/dist/semantic.min.js>"></script>
    <script>
        new Vue({
            el: '#app',
            delimiters: ['[[', ']]'],
            data: {
                query: '',
                talks: []
            },
            // This hook will be executed when the instance of
            // Vue is created
            async created () {
                this.talks = await this.getTalks()
            },
            methods: {
                // Sends a request to our API in order to get
                // a list of talks
                async getTalks () {
                    const response = await axios.get('/api/v1/talks/', {
                        params: {
                            q: this.query
                        }
                    })
                    return response.data.results
                }
            },
            watch: {
                // This function will be executed every time
                // the user changes `query`.
                // Using debounce from lodash library here allows us to
                // delay sending a request to an API until
                // the user has stopped changing `query`.
                // Stopped typing, basically.
                query: _.debounce(async function () {
                    this.talks = await this.getTalks()
                }, 500)
            }
        })
    </script>
</body>
</html>
And update urlpatterns:

# ted/urls.py
...
from talks.views import talk_list
urlpatterns = [
    ...
    path('talks/', talk_list),

Naixian Zhang

Django + Elastic Search

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply