Default Field Map
The SearchStax Site Search solution offers a Crawler add-on for Enterprise clients. The Crawler indexes the pages of your website starting with a single root node.
The crawler maps information about a web page to Solr schema fields in the Site Search index. This page presents reference information about the crawler’s default field mappings. See Crawler for instructions on adding custom fields.
Default Field Mappings
Upon creation of a new crawler, default fields are automatically set with the following definitions for a given document type.
- Mappings that cannot be changed are not displayed in the field mapping table.
- All other fields are displayed and mappings can be changed by the user by deleting the field and creating a new one.
- The following rich text formats are supported: .pdf, .docx, .xlsx, .pptx, .txt, .rtf.
- Note that the field labels visible in the crawler settings are shown in the Drupal, Sitecore, and Custom App columns of the table.
Field | HTML, Rich Text | Field Category | Field Data Type | Can Change Mapping? | Drupal | Sitecore | Custom App | Page Property / default selector |
Unique ID | HTML, Rich Text | System Field | String | Yes | id | _uniqueid | id | |
URL | HTML, Rich Text | System Field | String | Yes | ss_url | url_s | url | |
Document Type: html, txt, pdf… | HTML, Rich Text | System Field | String | Yes | ss_document_type | document_type_s | document_type | document_type |
Title | HTML, Rich Text | System Field | Text | Yes | tm_X3b_en_title | title_txts_en | title | |
Content (text extracted from document. For Rich text this field is a system field, for HTML it’s optional and configured by user) | HTML, Rich Text | System Field | Text | Yes | tm_X3b_en_body | pagecontent_txts_en | content | content##//text() |
Description | HTML | Meta | Text | Yes | tm_X3b_en_description | renderedcontent_txts_en | description | description |
timestamp when document was crawled | HTML, Rich Text | System Field | Date | Yes | timestamp | displaydate_dts | date | |
part of the URL after domain, where each / is padded with spaces / | HTML, Rich Text | System Field | Text | Yes | tm_X3b_en_paths | paths_txts_en | paths | |
Keywords | HTML | XPath | Text | Yes | tm_X3b_en__keywords | keywords_txts_en | keywords | keywords |
Heading Level 1 | HTML | XPath | Text | Yes | tm_X3b_en__headings1 | headings1_txts_en | headings1 | //h1/text() |
Heading Level 2 | HTML | XPath | Text | Yes | tm_X3b_en__headings2 | headings2_txts_en | headings2 | //h2/text() |
Heading Level 3 | HTML | XPath | Text | Yes | tm_X3b_en__headings3 | headings3_txts_en | headings3 | //h3/text() |
Heading Level 4 | HTML | XPath | Text | Yes | tm_X3b_en__headings4 | headings4_txts_en | headings4 | //h4/text() |
Crawler Definition ID | HTML, Rich Text | Crawler Internal Fields, not mapped, added automatically | String | No | ss_exif_crawl_definition_id | exif_crawl_definition_id_s | exif_crawl_definition_id | |
Crawl Run ID (crawl job) | HTML, Rich Text | Crawler Internal Fields, not mapped, added automatically | String | No | ss_exif_crawlid | exif_crawlid_s | exif_crawlid | |
Crawler tenant ID (Searchstax customer ID) | HTML, Rich Text | Crawler Internal Fields, not mapped, added automatically | String | No | ss_exif_tenant_id | exif_tenant_id_s | exif_tenant_id | |
Crawler application ID (corresponds to studio app, but not the same) | HTML, Rich Text | Crawler Internal Fields, not mapped, added automatically | String | No | ss_exif_appid | exif_appid_s | exif_appid |
Questions?
Do not hesitate to contact the SearchStax Support Desk.