Step by step guide to integrate Gatsby project with AWS OpenSearch

If you are new to AWS and fond of frontend development, and wondering how integration would happen from a Gatsby project with AWS OpenSearch, then you are at the right place.

Damini Sinha

Oct. 17 20226 min. read time

Gatsby is an open-source framework that combines functionality from React, GraphQL and Webpack into a single tool for building static websites and apps.

The Amazon OpenSearch Service is a managed service that makes it easy to deploy, operate, and scale OpenSearch in the AWS Cloud. OpenSearch is a popular open-source search and analytics engine for use cases such as log analytics, real-time application monitoring, and click stream analytics.

Simple diagram showing steps taken to achieve connection from gatsby project (frontend) to AWS OpenSearch

Setup Gatsby project (Step 1)

Useful link for me to setup a Gatsby project.

Install gatsby plugin — gatsby-plugin-aws-elasticsearch (Step 2)

First, install the package using npm or yarn.

yarn add -D gatsby-plugin-aws-elasticsearch

npm install --save-dev gatsby-plugin-aws-elasticsearch

Gatsby plugin configuration (Step 3)

Following the steps mentioned in Gatsby plugin - gatsby-plugin-aws-elasticsearch readMe, my project’s gatsby-config.js looks something like this provided the required setup for AWS IAM roles are already in place.

resolve: `gatsby-plugin-aws-elasticsearch`,
   options: {
     endpoint: process.env.ELASTIC_AWS_ENDPOINT, 
    //to toggle the synchronisation    enabled: true,
   recreateIndex: true,  //A GraphQL query to fetch the data.   query: ` query {
             nodes{
               edges {
                node {
                     title
                     name
                     age
                     id
                properties {
                     customJob
                    }
                contentId
                }
               }
              }
             }`,
//A function which takes the raw GraphQL data and returns the nodes to process.selector: (data) => {
         return data.nodes.edges
        },
//A function which takes a single node (from selector), and returns an object with the data to insert into Elasticsearch.toDocument: (node) => ({
                 id: node.id,
                 title: node.title,
                 name: node.name,
                 age: node.age,
                 properties: node.properties.customJob,
                 contentId: node.contentId
             }),//The name of the index to insert the data into. If the index does not exist, it will be created.index: 'example-test',//An object with the mapping info for the index.mapping: {
      title: { type: 'keyword' },
      name:  { type: 'text' },
      age:   { type: 'text' },
      properties: { type: 'nested' },
      contentId:{ type: 'keyword' }
    },//The AWS IAM access key ID.accessKeyId: process.env.ELASTIC_AWS_ACCESS_KEY_ID,//The AWS IAM secret access key.secretAccessKey: process.env.ELASTIC_AWS_SECRET_ACCESS_KEY
}

This plugin must be placed last in your list of plugins to ensure that it can query all the GraphQL data

Tweaks done in plugin to establish the connection between Gatsby project to AWS OpenSearch (Step 4)

Since I cloned the Gatsby plugin — gatsby-plugin-aws-elasticsearch in the project wanted to mention the below tweaks which resolved some of the issues I encountered.

Some of the files of the plugin required changes which are as follows

Updated a bit of the logic around passing credentials param in api.ts (plugin file) which Signs a request with the AWS credentials.

Issue faced — The request was not going through and was receiving Forbidden 403 in response from AWS

Resolution — Sent credentials param in sendRequest function and signRequest function

export const sendRequest = async <Request, Response>(
        method: 'GET' | 'PUT' | 'POST' | 'DELETE',
        path: string,
        document: Request,
        options: Options,
        language = 'no',
        credentials: AWSCredentials
): Promise<ResponseOrError<Response>> => {
.
.
.
const headers = signRequest(request, credentials);

Code snippet for issue - forbidden request 403

2. Adding sessionTokens in options.ts (plugin file) which explicitly tells about the configuration available for plugin

Issue faced — There was authentication issue with available credentials option since they are temporary

Resolution — Added sessionTokens in OptionsStruct object

...  accessKeyId: string(),
   secretAccessKey: string(),
   sessionToken:string(),...

Code snippet for sessionTokens

3. Returning toDocument in index.ts which is the main file

Issue faced — During project build time while debugging toDocument comes as undefined eventhough configuration provided is correct

Resolution — return toDocument

const nodes = options.selector(data).map((node) => {
  return options.toDocument(node);
});

Code snippet for issue toDocument as undefined

4. Adding Throttling request logic in api.ts (plugin file)-updated the code for handling 429 response (exponential backoff)

Issue faced — Sometimes the request to AWS was not going through in first go and seemed alright in second go. Got status code as 503 , 429 and 502 eventhough AWS services were up and running

Resolution — Added throttling request logic to handle the above error (exponential backoff) inside getResponse function

if (response.status === 429 || response.status === 503 || response.status === 502) {
let counter = 1;
const delay = (ms: number) => new Promise((resolve) =>       setTimeout(resolve, ms));
//Exponential backoff
  while (counter <= 3) {
     await delay(3000 * counter);
     response = await getResponse();
     counter++;
     if (response.status !== 429 && response.status !== 503 
          && response.status !== 502) 
        {
          break;
        }
  }
}

Code snippet for throttling request

Connection to OpenSearch established (Step 5)

After these configuration and tweaks, the connection AWS should be successful, for my project I was able to verify with CI/CD build pipeline.

Next bits gets more exciting as we move towards AWS services…

AWS Lambda function (Step 6)

Lambda is a compute service that lets you run code without provisioning or managing servers.

Lambda runs instances of your function to process events. You can invoke your function directly using the Lambda API, or you can configure an AWS service or resource to invoke your function.

This Node.js code snippet did the trick for me which consists of async functions with some logs. Lambda console can be used for debugging the functions first, until you start getting the response body and status code as 200.

Below is the code snippet where appending request headers helped me alot to get the proper response.

const getQueryResults = (searchString, domain) => new Promise((resolve, reject) => {     
    var endpoint = new AWS.Endpoint(domain);     
    var request = new AWS.HttpRequest(endpoint, region);        
        request.method = 'GET';         
        request.path += `indexName/_search`;     
        request.body = `{     
                       "query": {       
                           "multi_match": {"query":   "${searchString}",                       
                           "title": "${searchString}" 
                           }              
                        }           
                    }`       
        request.headers['host'] = domain;     
        request.headers['Content-Type'] = 'application/json';       
        request.headers['Content-Length'] = Buffer.byteLength(request.body).toString();         
        var credentials = new AWS.EnvironmentCredentials('AWS');     
        var signer = new AWS.Signers.V4(request, 'es');     
            signer.addAuthorization(credentials, new Date());     
        var client = new AWS.HttpClient();     
            client.handleRequest(request, null, function (response) {      
            console.log(" request ",request);      
        var responseBody = '';     
            response.on('data', function (chunk) {responseBody += chunk;});     
            response.on('end', function () {resolve(responseBody);
                  });      
                },       
        function (error) {         
            reject(error);    
        })
    })

Code snippet to query for search string

const AWS = require("aws-sdk");
const SSM = require("aws-sdk/clients/ssm");
const getSsmValue = async () => {
  const data = await SSM.getParameter({
    Name: `DomainSsmParameterName`,
  }).promise();
  const domainResponse = { statusCode: 200, body: data };
  return domainResponse;
};
var region = "aws-region";
exports.handler = async function (event, _context, callback) {
  const domainResponse = await getSsmValue();
  if (domainResponse) {
    let domain = domainResponse.body.Parameter.Value;
    console.log("Received event: ", JSON.stringify(event, null, 2));
    const results = await getQueryResults("Example", domain);
    callback(null, JSON.parse(results));
  } else {
    console.log(
      "Error while fetching ssm parameter value `DomainSsmParameterName`"
    );
  }
};

code snippet for appending request headers

graphQL Appsync API (Step 7)

GraphQL APIs built with AWS AppSync give front-end developers the ability to query multiple databases, microservices, and APIs from a single GraphQL endpoint. There are few steps to achieve this which can be followed here

Design your schemas

Attaching datasource — for me the datsource search is AWS Lambda functions mentioned in Step 6

Attaching resolvers

Retrieve Data with a GraphQL Query

Attaching resolvers could get bit tricky, here is an example based on above schema.

Request mapping template — Resolver

#** RequestMapping array
The value of 'payload' after the template has been evaluated
will be passed as the event to AWS Lambda.
*#
{
    "version":  "2017-02-28",
    "operation": "Invoke",
    "payload": {
        "field": "search",
        "query": $util.toJson($context.arguments.query)
    }
}

Request Mapping

## Declare an empty array as response mapping array
#set( $result = [])
## Loop through results
      #foreach($entry in $context.result.hits.hits)
          ## Add each item to the result array
          $util.qr($aemResult.add(
          {
              "title" : $entry.get("_source")['title'],
              "age" : $entry.get("_source")['age'],
              "name":$entry.get("_source")['name']
          }))
      #end
            #set ($res = { "aemResult": $result } )
## Parse the result
      $util.toJson($res)

Response mapping template — Resolver

Now that there is a record in your database, you’ll get results when you run a query. One of the main advantages of GraphQL is the ability to specify the exact data requirements that your application has in a query. You will start getting the result in form of json object.

const nodes = options.selector(data).map((node) => {
  return options.toDocument(node);
});

Testing with UI Search component (Step 8)

Assuming that UI components are ready and connected to graphQL Appsync API, you will start getting the search result like following. Done.

If you have read this far, you definitely deserve a cookie. Don’t forget to clap and share. Thank you.

Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the official policy or position of DNB.