This project has moved and is read-only. For the latest updates, please go here.

UTF8 header in XML

Topics: Developer Forum, Project Management Forum, User Forum
Sep 9, 2009 at 6:52 PM


I got weird and nasty problem trying to integrate SHFB generated Help2 file into VS2008 'Help Integration Wizard' setup project.

In step 3 when preparing TOC and click on '<< Include' button there was error saying 'Data at the root level is invalid. Line 1, position 1.' - as you see not informative message. So, I start debugging and found that TOC XML starts with 0xEF, 0xBB, 0xBF, 0x3F ... or with normal word - UTF8 header. I check with Namespace# 2.0 and found in HexView that all XML files are headed with UTF8 header too. And this confuses TOC builder in 'Help Integration Wizard' only. Generally I think that this is bug in VS2008, but easy fix is to tell SHFB to use ASCII encoding instead of UTF8. Can someone tell me how to do this?

P.S. DocProject generated Help 2.x file was with ASCII encoding, may be there is configuration in SandCastle that is different in SHFB?



Sep 9, 2009 at 8:24 PM
Edited Sep 9, 2009 at 8:25 PM

You can edit the template files found in the .\Templates folder in the SHFB installation folder.  Edit the Help2x*.* files and remove the UTF-8 header or resave them using ASCII encoding.



Sep 9, 2009 at 11:12 PM

Hmmm ... I try this but the result was the same. For insurance I resave all files in SHFB instalation folders and my help project files using ASCII encoding and result was still the same. Is it possible when combining template files with content to save them with utf8 encoding again?

( I fix some minor errors in XSD files that breaks intellysense in VS2008 on my machine - are you interested to take a look at changes? )

Sep 10, 2009 at 1:26 AM

Well ... as I suppose encoding is changed during building of working files. 

Quick & Dirty fix:

protected void UpdateTableOfContents(HelpFileFormat format)
  content = BuildProcess.ReadWithEncoding(tocFile, ref enc);
  enc = Encoding.ASCII; // VS2008 Help Integration Wizard bugfix

  // Write the file back out with the appropriate encoding
  using(StreamWriter sw = new StreamWriter(tocFile, false, enc))

Explanation:  In SandcastleBuilder.Utils.BuildEngine.BuildProcess.ReadWithEncoding(string filename, ref Encoding encoding) method, when filename is XML file then encoding is set always to value of encoding attribute in <?xml version="1.0" encoding="utf-8"?> ... i.e. there is no matter if template file has any unicode header or not, this method will set UpdateTableOfContents() enc variable to UTF8 encoding always. And Help Integration Wizard will throw stupid exception until fixing with Encoding.ASCII.

This bug cost me two endless nights ...  I really hate such things, but now everything works fine :)