Stumbled upon a blog about using Django and Elastic Search to find TED talks, I dive deep to understand in particular Elastic: what it is, how to run it, how to communicate with it.
There is an Elastic Stack, an ecosystem of different tools: Kibana, Logstash, Beats and Elasticsearch itself.
It allows you to send HTTP request, in our document searching, elasticsearch is used as shown below. It is not a SQL kind of search, more efficient, faster, communicates through RESTful API. In essence, Elasticsearch is NoSQL database. It stores data as JSON documents.
requests.post(NSS_URL,
data=json.dumps(post_data),
params=args
GET /tweets/doc/_search
{
“query”: {
“match”: {
“author”: “elon”
}
}
}
{
“took” : 5,
“timed_out” : false,
“_shards” : {
“total” : 5,
“successful” : 5,
“skipped” : 0,
“failed” : 0
},
“hits” : {
“total” : 3,
“max_score” : 0.2876821,
“hits” : [
{
“_index” : “tweets”,
“_type” : “doc”,
“_id” : “1”,
“_score” : 0.2876821,
“_source” : {
“author” : “Elon Musk”,
“text” : “This might be my finest work”,
“likes” : 43000
}
},
{
“_index” : “tweets”,
“_type” : “doc”,
“_id” : “2”,
“_score” : 0.18232156,
“_source” : {
“author” : “Elon Musk”,
“text” : “Thank you!”,
“likes” : 42000
}
},
{
“_index” : “tweets”,
“_type” : “doc”,
“_id” : “4”,
“_score” : 0.18232156,
“_source” : {
“author” : “Elon Musk”,
“text” : “@apirobotme your blog is the best blog about web development I have ever read. Thank you!”,
“likes” : 1000000
}
}
]
}
}
in the blog, the author tried to build a web app using Django for searching TED talks. ” The project will use PostgreSQL as a relational database, Elasticsearch and Django. The simplest way to set up everything is to use Docker. We have already used Docker previously when we run Elasticsearch in a container. But now we will have 3 containers. One for PostgreSQL, one for Elasticsearch and one for Django web application. “
What strikes me odd is that he still had to build a relational database to apply Elastic Search – doesn’t’ it defeats ElasticSearch’s own purpose or merits?
Inserting data into relational database, here is the models.py file in Django:
# talks/models.py
from django.db import models
class Talk(models.Model):
name = models.CharField(max_length=200)
description = models.TextField()
speaker = models.CharField(max_length=200)
url = models.URLField()
number_of_views = models.PositiveIntegerField()
transcript = models.TextField()
def __str__(self):
return self.name
After downloading TED talks csv data from kaggle, then define an Elasticsearch index and TalkDocument class. This class basically connects our relational database with Elasticsearch.
# talks/documents.py
from django_elasticsearch_dsl import DocType, Index
from .models import Talk
talks = Index('talks')
talks.settings(number_of_shards=1, number_of_replicas=0)
@talks.doc_type
class TalkDocument(DocType):
class Meta:
# The model associated with Elasticsearch document
model = Talk
# The fields of the model you want to be indexed
# in Elasticsearch
fields = (
'name',
'description',
'speaker',
'number_of_views',
'transcript',
)
Now create a function that searches for relevant talks:
# talks/search.py
from elasticsearch_dsl.query import Q, MultiMatch, SF
from .documents import TalkDocument
def get_search_query(phrase):
query = Q(
'function_score',
query=MultiMatch(
fields=['name', 'description', 'speaker', 'transcript'],
query=phrase
),
functions=[
SF('field_value_factor', field='number_of_views')
]
)
return TalkDocument.search().query(query)
def search(phrase):
return get_search_query(phrase).to_queryset()
Next, create a simple API using Django REST Framework.
this is the serializer.py
# talks/api/serializers.py
from rest_framework import serializers
from ..models import Talk
class TalkSerializer(serializers.ModelSerializer):
class Meta:
model = Talk
fields = (
'name',
'description',
'speaker',
'url',
'number_of_views',
'transcript',
)
this is the view.py file
# talks/api/views.py
from rest_framework import generics
from ..models import Talk
from ..search import search
from .serializers import TalkSerializer
class TalkList(generics.ListAPIView):
queryset = Talk.objects.all()
serializer_class = TalkSerializer
def get_queryset(self):
q = self.request.query_params.get('q')
if q is not None:
return search(q)
return super().get_queryset()
Lastly a nice template file
<!-- talks/templates/talks.talk_list.html -->
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Talk List</title>
<link rel="stylesheet" href="<https://cdn.jsdelivr.net/npm/semantic-ui@2.4.2/dist/semantic.min.css>">
</head>
<body>
<div id="app">
<div class="ui placeholder segment">
<div class="ui input focus">
<input
v-model="query"
type="text"
placeholder="Search for talks..."
/>
</div>
</div>
<div class="ui three column stackable grid container">
<div v-for="talk in talks" class="column">
<a class="ui card" :href="talk.url">
<div class="content">
<div class="header">[[ talk.name ]]</div>
<div class="meta">[[ talk.speaker ]]</div>
<div class="description">[[ talk.description ]]</div>
</div>
<div class="extra content">
<i class="check icon"></i>
[[ talk.number_of_views ]] Views
</div>
</a>
</div>
</div>
</div>
<script src="<https://unpkg.com/vue>"></script>
<script src="<https://unpkg.com/lodash>"></script>
<script src="<https://unpkg.com/axios/dist/axios.min.js>"></script>
<script src="<https://cdn.jsdelivr.net/npm/semantic-ui@2.4.2/dist/semantic.min.js>"></script>
<script>
new Vue({
el: '#app',
delimiters: ['[[', ']]'],
data: {
query: '',
talks: []
},
// This hook will be executed when the instance of
// Vue is created
async created () {
this.talks = await this.getTalks()
},
methods: {
// Sends a request to our API in order to get
// a list of talks
async getTalks () {
const response = await axios.get('/api/v1/talks/', {
params: {
q: this.query
}
})
return response.data.results
}
},
watch: {
// This function will be executed every time
// the user changes `query`.
// Using debounce from lodash library here allows us to
// delay sending a request to an API until
// the user has stopped changing `query`.
// Stopped typing, basically.
query: _.debounce(async function () {
this.talks = await this.getTalks()
}, 500)
}
})
</script>
</body>
</html>
And update urlpatterns:
# ted/urls.py
...
from talks.views import talk_list
urlpatterns = [
...
path('talks/', talk_list),