This project has moved and is read-only. For the latest updates, please go here.

Caching to speed up FillNode.aspx

Topics: Developer Forum
May 10, 2013 at 10:43 PM
Edited May 10, 2013 at 11:07 PM
With a 12-megabyte WebTOC.xml, the single-user response times from FillNode.aspx are typically between 500 and 600 milliseconds, as shown by the Web Console in the Firefox browser. This is because FillNode.aspx parses the entire file to a new XPathDocument instance on each request.

If I change FillNode.aspx to cache the XPathDocument instance between requests, then the single-user response time is only around 120 milliseconds; typically between 115 and 125. This is with a CacheDependency on the file and lock (toc) { ... } around everything that accesses the XML nodes after they have been read from the cache. (Although XPathDocument is read-only on the surface, it is not thread-safe.) Such serialization may be bad for scaling to hundreds of simultaneous users but I suspect the current code wouldn't work too well in that situation either. For better parallelism, one might consider setting up a pool of XPathDocument instances, or using the thread ID or processor ID as part of the cache key.

This 120 milliseconds can easily be halved to 60 milliseconds, by removing the if(root.Count == 0) check and instead checking after the foreach loop whether the StringBuilder is still empty.

Far better savings can be achieved by configuring IIS user-mode output caching to consider only the "Id" and "topic" query string variables and ignore the "hash" variable. If the data is already in the cache, then the response time is typically only 1 or 2 milliseconds. However, each cache entry only covers one parent node, and the output cache tends to discard rarely-used entries pretty soon after the last use. Thus, for the best performance, the IIS output cache should be used together with XPathDocument caching.
May 13, 2013 at 5:02 PM
A better solution:

Define classes for the HelpTOC and HelpTOCNode elements, and load WebTOC.xml to them with XmlSerializer. Take all the nodes that have IDs, and collect them to a Dictionary<Guid, TocNode> instance. Insert this dictionary into Page.Cache, with the appropriate CacheDependency. On each request, find the correct node from the dictionary, and show its children. Because Dictionary<TKey, TValue> is thread-safe for multiple readers, no locking is needed after initialization.

After the application pool has been restarted, the first use of FillNode.aspx takes about 880 ms in this solution. That includes the time XmlSerializer spends generating and compiling the serialization assembly. Subsequent requests typically take only two milliseconds, so the IIS output cache is no longer useful. If the WebTOC.xml file is modified, then FillNode.aspx automatically reloads it with the previously generated serialization assembly; this takes about 430 ms.

The initialization time could be improved by replacing XmlSerializer with hardcoded methods that use XmlReader, but 880 ms is already fast enough for something that runs perhaps once a day, and custom deserialization would make later maintenance more expensive.
May 13, 2013 at 8:17 PM
If you'd care to share your changes, I'd be happy to merge them into a future release of SHFB. You can open a work item here and attach them to it or e-mail me the changes. My e-mail address is in the About box in the standalone GUI and in the footer of the pages in the help file.

May 15, 2013 at 12:01 AM
I'll have to ask whether my employer is willing to contribute the changes.
If not, someone else can perhaps implement a similar solution.
Jun 4, 2013 at 5:14 PM
I posted the modified FillNode.aspx to the patch tracker.
Jun 4, 2013 at 8:33 PM
Thanks. I'll merge it into the code for the next release.