Indexing website topics/sites

There are two ways to start indexing websites you're interested in.

First is to install one from our list of community built lenses

Lens manager

Building your own lens, which are simple text files which tell the application what to crawl & index. For example, here is a lens for recipes that I like to search for on a routine basis:


#![allow(unused)]
fn main() {
(
    version: "1",
    // Be proud of your creation :). Maybe soon we can share these ;)
    author: "Andrew Huynh",
    name: "recipes",
    description: Some(r#"
        A curated collection of websites with useful, high-quality recipes.
    "#),
    // Set to false if you want to disable this lens
    is_enabled: true,
    domains: [

        // Major sites that often have really good recipes
        "www.seriouseats.com",
        "cooking.nytimes.com",
        ...

        // Specific cuisines/sites that I've found randomly w/ high-quality recipes
        "www.hungryhuy.com",
        "www.vickypham.com",
    ],

    urls: [
        // URLs are considered prefixed, i.e. anything that starts w/ the following
        // will be matched and crawled.
        //
        // https://www.reddit.com/r/recipes/ -> matches
        // https://www.reddit.com/r/recipes_not/ -> does not match, notice the end slash.
        "https://www.reddit.com/r/recipes/",
    ],

    // Rules allow you to refine how the crawler determine whether it should crawl
    // a URL or not.
    rules: [
        // SkipURL is a simple regex (similar to ones in robots.txt) that when matches
        // a URL will skip crawling it.
        //
        // For example, below I'm skipping over any URLs that have the word "broccoli"
        // in the path, despite the benefits to my health.
        SkipURL("https://www.seriouseats.com/*broccoli*"),
        // Limits the depth of a URL to a certain depth.
        // For example:
        //  - LimitURLDepth("https://example.com/", 1) will limit it to https://example.com/<path 1>
        //  - LimitURLDepth("https://example.com/", 2) will limit it to https://example.com/<path 1>/<path 2>
        //  - etc.
        // In this case, we're limiting to only https://www.reddit.com/r/recipes/<post> so we don't
        // index too many comments/etc. from reddit.
        LimitURLDepth("https://www.reddit.com/r/recipes", 1),
    ]
)
}

Spyglass User Guide

Indexing website topics/sites