SearchStax Help Center

Crawler Content Field

The SearchStax Site Search solution’s Crawler add-on has a catch-all system field, content, that collects all the text on a web page and tokenizes it for keyword search.

Depending on the website, this field might collect more text than you really want to index. For instance, the SearchStax documentation pages (like this one) have a header, footer, and navigation pane (the left-side table of contents). The navigation pane in particular captures every significant keyword in the corpus. Therefore, every search matches every page.

To sharpen the discrimination of the search, we can delete the content field from the list of crawler fields, and substitute a new paragraph field.

This is an xpath //p//text() text field configured to locate every <p> element of the page and capture its content for the index. This let’s us search the page’s unique payload without matching items in the Navigation Pane.

Note that the Crawler’s default set of fields already capture the page’s title and headings, which are likely to be the richest source of keywords that are truly pertinent to the topic.

Questions?

Do not hesitate to contact the SearchStax Support Desk.

Return to Frequently Asked Questions.