Crawler Walkthrough


This is an end-to-end walkthrough of indexing the SearchStax product documentation website using the SearchStax Site Search solution and its Crawler. This exercise takes about half an hour to complete.

Enterprise Clients Only!

The Crawler feature is restricted to Enterprise clients only. The following restrictions apply:

  • The feature is restricted to one crawl per day.
  • Crawls are limited to 10,000 pages or 100,000 pages per crawl, depending on your contract.
  • Individual page size is limited to 100 MB (HTML) or 1 GB (rich-text).

Getting search results for your website is surprisingly easy, but there are moments when you wonder what to do next. This discussion captures those moments.

We assume that, as an Enterprise client, you have a Site Search account courtesy of your SearchStax Onboarding Manager. Log in to the Site Search interface.

The exercise begins with setting up and running the Crawler. When the crawl is complete, we walk though the Site Search features that create the first search experience for your site.

Contents:

This page covers the following topics:

Create the Site Search App

When you Create the Site Search App, be certain to configure it as a “Custom” application.

Configure and Run the Crawler

If the crawler feature is enabled for your account, you’ll find it listed under Connectors in the Search App’s navigation menu:

This link opens the Crawler list, which is initially empty.

A Search App can have one or more crawlers depending on the terms of your contract. Each crawler can index a different website. The list is initially empty. Click Create a Crawler.

The next step is to provide your crawler with a name and a starting URL. Site Search will verify that the URL is reachable.

The crawler begins with a root URL and follows page links from there to all connected pages within the same corporate domain, subject to a configurable crawl depth.

You can Crawl Now if you wish, but we advise you to visit the list of crawler fields first. The crawler is limited to one run per day, and we need to set up a special field before launching it.

Configure the Crawler Fields

This is an optional step to demonstrate setting up a facet. The crawler imports a set of default fields from webpages (see Default Field Map for details). You will find, however, that your target website uses additional fields. Site Search lets you add these fields to the crawl.

The SearchStax website doc pages contain a Products meta tag that makes a simple facet demonstration:

<meta name="Products" content="Managed Search">

We’d like the crawler to import the value of this tag to the index.

Open the Manage Fields for Search Index section of the crawler settings. You’ll see the list of default fields.

These fields could be useful in your project, and are harmless if not. Click the Add Custom Field button. The resulting dialog box is described on the Crawler page.

  • Set the Custom Field Name to products. This is the label you will see in the Site Search lists of fields.
  • Select the Meta Tag Name option and enter Products. This is the meta tag name from the target page HTML.
  • For a field destined to become a facet, the string datatype is usually the best choice.

Click Add Field. You’ll see the new field in the list, labeled products_ss (your field name plus the string datatype).

When you are satisfied with the setup, click the Crawl Now button.

Crawling

As the crawl proceeds, you’ll see progress statistics updating.

The error count represents things like incompatible file types encountered in the crawl. Site Search cannot be more specific than that.

Inspect the Document Fields

This section presents some “tips and tricks” that are helpful for inspecting the output of the crawler before you configure the Search Fields and Result Fields in Site Search.

Wait Five Minutes!

Due to search-engine configuration settings, it may take as much as five minutes for the crawl data to be committed to the index. Until this time elapses, Site Search displays and query results will look the same as they did before the crawl.

Navigate to the Site Search App > Settings > Search API tab. You’ll find the Read-Write credentials about halfway down the screen.

Site Search auto-generates the Search API username(s), as in app1234-admin. If you don’t remember the associated password, simply type in a new one. It might be a good idea to write it down, because you’ll need it again in a minute.

Now scroll back up the screen and find the App’s Select Endpoint.

Copy the endpoint to a text buffer and make these changes to it:

  • Insert username:password@ in the URL between https:// and ss123456.
  • Change emselect to select.
  • Append ?q=:&wt=json&indent=true at the end following select.

You should now have a URL similar in general to this:

https://app1234-admin:password@ss123456-aibgu8n5-us-west-1-aws.searchstax.com/solr/quickcrawl-1234/select?q=*:*&wt=json&indent=true

Paste this string into the address field of a web browser and send it. It will return ten documents from your index, showing all of the fields in use and examples of their content. (Notice the products_ss custom field near the bottom of this list.)

      {
        "id":"https://www.searchstax.com/docs/searchstudio/analytics-glossary/",
        "exif_tenant_id":"2",
        "exif_crawlid":"2151",
        "exif_crawl_definition_id":"43",
        "exif_appid":"studio-1810",
        "url":["https://www.searchstax.com/docs/searchstudio/analytics-glossary/"],
        "paths":["docs / searchstudio / analytics-glossary"],
        "document_type":["html"],
        "date":"2024-06-24T02:36:12Z",
        "title":["Analytics Glossary - SearchStax Site Search Docs"],
        "headings1":["Analytics Glossary"],
        "headings2":["Questions?"],
        "description":["The SearchStax Site Search solution's Analytics Glossary is a summary of key terms and definitions used for analytics in Site Search."],
        "products_ss":["Site Search"],
        "content":["Analytics Glossary - SearchStax Site Search Docs Managed Search Site Search Help Center Support Contact Us Sign-In Site Search SearchStax Site Search Documentation User Documentation Analytics and Insights Analytics Glossary Navigation Managed Search Site Search Help Center Support Contact Us Sign-In Search Free Trial Sign-In Site Search Docs Home How to File a Support Ticket User Documentation Getting Started Getting Started Navigation Menu Task List Your First Search Analytics and Insights Overview Dashboard Searches Items Power Search Search Feedback Analytics Glossary Search Experience Overview Results Configuration Results and Display Faceting Sorting Basic Relevance Stopwords Synonyms Spell Check Relevance Modeling Model Details Search Fields Global Filters Ranking Rules Promotions Search Preview Editing Themes with Theme Editor Recommendations Overview Auto-Suggest Related Searches Developer Documentation Site Search Developer Documentation Search UI Kit Search UI Kit Accessibility searchstudio-ux-js What is searchstudio-ux-js Getting Started with searchstudio-UX-JS Widget Configuration Styling JS Widgets Input Widget JS Results Widget JS Facets Widget JS Pagination Widget JS Search Feedback Widget JS Related Searches Widget JS External Promotions Widget JS Sorting Widget JS Interfaces searchstudio-ux-vue What is searchstudio-ux-vue Getting Started with searchstudio-ux-vue Vue Widgets SearchstaxInputWidget for Vue SearchstaxResultWidget for Vue SearchstaxFacetsWidget for Vue Searchstax-PaginationWidget Vue SearchStaxInputWidget for Vue Searchstax-RelatedSearchesWidget Vue Searchstax- ExternalPromotions-Widget Vue Searchstax-SortingWidget for Vue Searchstax-ux-vue Styling searchstudio-ux-react What is searchstudio-ux-react Getting Started with searchstudio-ux-react SearchstaxInputWidget React SearchstaxResultWidget React SearchstaxFacetsWidget React Searchstax-PaginationWidget React React Widgets Searchstax-RelatedSearchesWidget React Searchstax-SearchOverviewWidget React Searchstax- ExternalPromotions-Widget React Searchstax-SortingWidget React Searchstax-ux-react Styling searchstudio-ux-angular What is searchstudio-ux-angular Getting Started with searchstudio-ux-angular Angular Widgets SearchstaxInputWidget Angular SearchstaxResultWidget Angular SearchstaxFacetsWidget Angular Searchstax-PaginationWidget Angular Searchstax-RelatedSearchesWidget Angular Searchstax- ExternalPromotions-Widget Angular Searchstax-SortingWidget Angular Searchstax-ux-angular Styling Searchstax-SearchOverviewWidget Angular Site Search App Overview Creating a Search App App Settings App Summary Tab Search API Tab Discovery API Tab Analytics API Tab Search Feedback Tab API Documentation Overview Search APIs Overview Search API Auto-Suggest API Analytics APIs Overview Tracking API (REST) Tracking API (Javascript) Reporting API (REST) Discovery APIs Overview Popular Searches API Related Searches API Ingest APIs Ingest API Security Overview Change Your Password Two-Factor Authentication Single Sign-On Single Sign-On - OKTA Multi-Language Experiences Multi-Language Experiences UX Accelerators Overview Browser Compatibility SearchJS Module External Promotions in SearchJS Search Widget Feedback Widget Integrations Sitecore Module 2.0.0 Upgrade to Module 2.0.0 Mapping Sitecore Fields Sitecore Module FAQ Custom Search Page Modify Configs Field for Faceting/Sorting in Sitecore Multi-Site Search Multi-Root Crawling Sitecore Module with Docker Sitecore SXA Sitecore Personalization Drupal Module Set Up Drupal Module Mapping Drupal Fields Drupal Module Functionality Drupal Auto-Suggest Drupal FAQ Theme Editor Reference Variables and Functions Example Theme Administration Billing & Payments Overview Service Limits Subscriptions Billing Information Payment Method Order History Invoice History Site Search Regions Account Management Overview Managing Users Cancel App Subscription Close Account Analytics Glossary This is a glossary of analytics metrics reported by the SearchStax Site Search solution. Description Example Total Searches Total Number of Search Requests in the selected time interval. These include searches with results and searches with no results No Result Searches Total Number of Searches that resulted in zero results. Searches with Results Total Number of Searches that resulted in 1 or more results being shown to the user Impressions Results shown to the user as the result of a search Hits Number of items that match this search. Total Searches with Clicks Total Number of Searches with at least one click on a result item. Total Clicks Total Number of clicks on the Search Results that were shown to the user Click-Through Rate Click-Through Rate is the percentage of tracked searches that generated at least one click on a result. Mathematically it is (Number of searches with at least 1 click / Total Searches) * 100 Searches with Clicks Total Number of Searches that resulted in a Click Through Event Avg Click Position Average of the positions of the impressions that were clicked in the list of results Average Search Latency Average Latency across all Searches Total Sessions Total number of user sessions when a search was fired. A user can execute multiple searches in a session. Searches per Session Number of Searches per session. Mathematically, it is Total Searches / Total Sessions Search Exits The number of searches made immediately before leaving the site % Search Exits Search Exits / Total Unique Searches % No Results Percent of Total Searches that produced zero results. MRR Mean Reciprocal Rank. The average of 1/Click Position for all click-through events. It normalizes the score to the range 0 to 1, with 1 being best. Total Impressions Total Number of Impressions shown across all searches Total Clicks Total Number of Clicks that were made on the shown impressions Item Click-Through Rate Percentage of Total Impressions that received a click-through event. Mathematically, it is  (Total Clicks / Total Impressions) * 100 Searches with Related Searches Number of total searches that offered at least one Related Search. Related Searches with Clicks Number of Related Searches that were launched by the search user. Related Searches Click-Through Rate “Related Searches with Clicks” divided by “Searches with Related Searches” x 100. Questions? Do not hesitate to contact the SearchStax Support Desk . Products Managed Search Site Search Pricing Managed Search Site Search Services Solr Consulting RESOURCES Blog Documentation Case Studies White Papers Videos Podcasts Events Support Company About Us Partners Careers Terms of Service Privacy Policy News Contact Us Contact Us SOCIAL NETWORKS SECURITY & COMPLIANCE AICPA SOC 2 GDPR HIPAA ISO/IEC 27001 Copyrights © SearchStax Inc.2014-2024. All Rights Reserved."],
        "_version_":1802708265532915712}

If you have difficulty making this work, contact SearchStax Support for assistance.

This output will be a convenient resource in the following steps.

Configure Result Fields

We must now tell Site Search which fields to display in the results and how you want them assigned to pre-formatted locations in the Search UI App.

Configure Results first!

You must set up at least one Results field before creating a Relevance Model.

Click Results Configuration in the navigation menu. Expose the Results and Display tab.

The Results and Display tab lets us select fields from the index for display in the search results.

Reload the Schema (again)!

After a crawler run, and in addition to waiting five minutes for the index to commit, you should click the Reload Schema button to update the list of potential display fields.

Choose a field from the Return Field list (in the upper red box above). You can add a human-friendly Label if needed. Then map the field value to a Results Card Field, as explained on the Results Configuration page. The (+) icon at the right adds the configured field to the list of display fields (the lower red box).

For this exercise, make the following mappings:

  • Map the index’s url field to the result card’s URL field. This will make the result-items clickable, linking them to the web pages they represent.
  • Map the index’s title field to the result card’s Title field. This will put the page’s title at the top of the result summary.
  • Map headings1, headings2, and headings3 to “No mapping” on the results card. This lists the field values below the result-item’s title.
  • Map the products field to the result card’s ribbon field. This will display the product name as a banner above the result-item.

When finished, click the Publish button.

Configure Search Fields

At this point, the webpage data is in the index, but we can’t search it yet. Before we can search, we have to choose which fields to index. For that step, we need to create a Relevance Model.

Click Relevance Modeling in the Navigation Menu, followed by Create a Model.

Give the Relevance Model a name and click the Create button.

The Search Fields tab of the Relevance Model screen tells Solr which index fields to search.

Reload the Schema!

After a crawler run, and in addition to waiting five minutes for the index to commit, you should click the Reload Schema button to update the list of potential search fields.

The left column is the available fields in the schema (not necessarily present in the crawled documents). Click on a field to move it into the list of searchable fields.

In this case, click on title, description, headings1, headings2, and headings3. These fields contain the most relevant keywords of each page, making it easy to focus the search on pages with appropriate content.

To experiment with a facet list, also add the product_ss field to this list. Facets must be based on search fields.

Then click Publish to re-issue the index. Publishing a small project like this one takes a couple of minutes.

Configure a Facet

The search results seem incomplete without at least one facet list off to the side. How do we set that up?

On the Results Configuration screen, select the Faceting tab. Full instructions for operating this screen are on the Faceting page. Check the box that enables faceting.

The fields in the red box let you select an index field to use in a facet. (If you don’t see it in the list, click that Reload Schema button again.) Select products_ss. You can add a label to be the title of the facet list. In our example, the facet options will be ranked by count.

Don’t forget to Publish.

View Search Results

Now we can view our search results. The Search Preview screen lets us search the index and inspect our search results and facet list. Click Search Preview in the Navigation Menu.

This is a fully-functional search environment. Note that our content_type facet is present, along with ten documents from the index. The requested display fields are present, plus id and Elevated fields to assist with debugging.

From here, you can go back to the previous steps to tune your search. Just remember to Publish your changes before leaving each of the editing screens. You can then return to the Search Preview screen to inspect your changes.

Share Search Results

The Search Preview screen satisfies the developer’s need to view search behavior and result values, but it has one drawback. To see it, you have to be an authorized Site Search user. Experience has taught us that a search project often has many more stakeholders than developers. The project will need a public search portal for stakeholders.

The Search UI App is a sharable search page for colleagues who are not Site Search users. Click the Search UI Kit item in the Navigation Menu, and then select the Search UI App tab.

This screen provides a URL to a sharable search environment. You can View the page immediately, or you can use the Copy icon on the right to share the URL with coworkers.

Use the Regenerate button to refresh the Search UI App after making changes.

Questions?

Do not hesitate to contact the SearchStax Support Desk.