Spell checking

Topics: User Forum
Apr 10, 2013 at 9:40 PM
Edited Apr 10, 2013 at 9:41 PM
Hi,

I would like to include spell checking in help file build process. What needs to be done?

Regards,
Martin
Coordinator
Apr 11, 2013 at 3:17 AM
You can create a build component that will perform spell checking on the topics as they are built. A few years ago I played around with this and did create the code necessary to spellcheck the relevant XML elements and attributes in XML comments and MAML topics. I considered a build component or perhaps integrating it into the GenerateInhertedDocs tool where it could provide better line and column information to locate the mistakes at least in MAML topics. However, I set it aside as I didn't feel the end result was as useful as it could be.

Having the results in the build log isn't that convenient as you've still got to manually take the topic or member ID and go to the topic or find the relevant member in the code to fix the spelling mistakes so you're constantly going back and forth between the build log and the code/topics. Suggestions for corrections can be dumped to the log but again it's not as useful as running the topic through a spell checker, picking the correct suggestion, and having it inserted into the topic/code interactively while you're editing it.

There are several spell checking solutions for Visual Studio so they may be worth a look. A task on my To Do list has been to see about adding spell checking to the standalone GUI so that you can handle it while editing topics. Another is to see if its worth trying to roll my own for Visual Studio that's specific to the MAML topics and XML comments or to find an acceptable existing add-in. These tasks are low on the list so I haven't done anything with them yet.

Eric
Apr 13, 2013 at 5:01 AM
I've found the old discussion somewhere on the net. I agree that it would be of limited use to dump the spelling mistakes to a build log. It would be unpractical to try to fix errors by going back and forth from log to files. To fix errors efficiently, spellchecking should be a part of the editing experience. However, it would be OK to have the option to dump warnings to build log, something like “topic X.aml has spelling errors, dumping first N misspelled terms…”. This could help enforce checking on legacy files or files that are frequently changed by other people that do not necessary have or use spellchecker while editing content.

Regarding spell checkers integrated with editor, I’ve been using Spell Checker for a while which works on code comments and documentation in code. I find it very useful. Unfortunately, it doesn’t help with aml files, I guess it does not recognize the tags to be able to recongnize the parts of the aml that needs to be checked. Could you recomend another similar tool which could spellcheck aml?

Martin
Coordinator
Apr 13, 2013 at 8:32 PM
Your earlier post got me curious so I went looking and did find that spell checker add-in. I noticed the issue with XML and MAML files so I downloaded the code to see how it works. Adding one extra classification type got it working with XML and MAML element content so I think it's a good fit and a few other minor tweaks should allow it to spell check text in attributes where appropriate too (i.e. stuff like the title attribute on a code element).

The one downside is that it uses a WPF TextBox control to do the spell checking so it has issues with different languages. However, it looks like it would be relatively straightforward to update the spelling code to use something else like Hunspell to open it up and make it more useful if people need support for dictionaries in other languages.

It could probably be used to add a GUI to spell check interactively or do the entire project/solution with output to the build window or a tool window. I think an option like that would be equivalent to a build component but faster as it wouldn't have to go through the entire build. The location info would be accurate too and could probably be used to jump to the misspelled word to correct it.

Eric
Apr 16, 2013 at 2:05 PM
You say "Adding one extra classification type got it working with XML and MAML element..." Could you please share a code?

Downside with WPF is not an issue in my case because I am targeting english only.

Martin
Coordinator
Apr 16, 2013 at 8:45 PM
The change was in the CommentTextTagger class. In the GetTags() method, add the extra check for the "xml text" name. That also lets it spell check the inner text in literal XML embedded in VB.NET code too.
if((name.Contains("comment") || name.Contains("string") || name.Contains("xml text")) &&
  !name.Contains("xml doc tag"))
I've fixed up some other issues in the code as well such as ignoring escape sequences and HTML entities. I've also got it working with NHunspell and a default English dictionary. I'm pretty sure I can extend it with a few other features to make it more useful when spell checking XML and MAML documents too. I plan on creating a separate project here on CodePlex and releasing it there since it's not dependent on SHFB or Sandcastle. I'll probably do that soon and get feedback on what people might like to see with regard to additional features. I'll include it as an installable option in the SHFB/Sandcastle guided installer.

In the meantime, if you'd like a copy to install and play around with, contact me via e-mail and I'll send you a copy of the VSIX file to install. My e-mail address is in the About box in the standalone GUI and in the footer of the pages of the SHFB help file.

Eric
Apr 16, 2013 at 10:43 PM
I've added the change to the if statement and now spelling works for the text in aml files. This is great! Thank you!

I am looking forward to having the "official" version. I'll send you an email to get the preview and share my impressions when I start using it.

Martin