Plain CDATA comments are being mangled

Topics: User Forum
Apr 12, 2010 at 9:56 PM

Eric:

I have a chunk of XML code inside a lengthy <remark> decorating a class. Most of the code sample is inside a CDATA section to make it pass through unchanged but there is clearly still some processing occurring here.

Here are the doc-comments from the source file:


   

    /// <code lang="xml" title="Context Schema Summary">
    /// <![CDATA[
    /// <?xml version="1.0" encoding="UTF-8"?>
    ///<EditorContext>
    ///  <Delimiters>
    ///    <CommentTokens [font style attributes] >
    ///      <BlockCommentStartMark>/*</BlockCommentStartMark>
    ///      <BlockCommentEndMark>*/</BlockCommentEndMark>
    ///    </CommentTokens>
    ///    <EndOfLineCommentTokens [font style attributes] >
    ///      <EndOfLineCommentMark>--</EndOfLineCommentMark>
    ///    </EndOfLineCommentTokens>
    ///    <QuoteTokens>
    ///        <QuoteToken [font style attributes] >'</QuoteToken>
    ///        . . .
    ///    </QuoteTokens>
    ///    <VariableTokens>
    ///        <VariableToken [font style attributes] >@</VariableToken>
    ///        . . .
    ///    </VariableTokens>
    ///    <SeparatorTokens>`-=~!@#$%^&amp;()+[]\{}|;':",./&gt;&lt;?&#xa;&#xd;&#x9;&#x20;</SeparatorTokens>
    ///    <NumberTokens [font style attributes w/bgcolor]/>
    ///    <PlaceholderTokens [font style attributes w/bgcolor] >
    ///        <PlaceholderStartMark>_{</PlaceholderStartMark>
    ///        <PlaceholderEndMark>}_</PlaceholderEndMark>
    ///    </PlaceholderTokens>
    ///  </Delimiters>
    ///  <WordsAndPhrases>
    ///    <WordGroup [type="some type name"] [font style attributes] >
    ///      <Keyword [alias attribute] [whiteSpace attribute] >SELECT</Keyword>
    ///      . . .
    ///    </WordGroup>
    ///    . . .
    ///  </WordsAndPhrases>
    ///</EditorContext>
    /// ]]>
    ///
    /// [font style attributes] ::=
    ///     font-family=<i>.NET font family name</i> (e.g. "Verdana", "Arial Black", etc.)
    ///     color=<i>.NET color name</i> (e.g. "Blue", "Magenta", etc.)
    ///     bold="true" | "false"
    ///     italic="true" | "false"
    ///     
    /// [whiteSpace attribute] ::=
    ///     whiteSpace="preserve" | "replace" | "collapse"
    /// </code>


And here is what comes out from SHFB:


<?xml version="1.0" encoding="UTF-8"?>
<EditorContext>
<Delimiters>
<CommentTokens [fontstyle attributesBlockCommentStartMark>/*</BlockCommentStartMark>
<BlockCommentEndMark>*/</BlockCommentEndMark>
</CommentTokens>
<EndOfLineCommentTokens [fontstyle attributesEndOfLineCommentMark>--</EndOfLineCommentMark>
</EndOfLineCommentTokens>
<QuoteTokens>
<QuoteToken [fontstyle attributes'</QuoteToken>
. . .
</QuoteTokens>
<VariableTokens>
<VariableToken [font style attributes]>@</VariableToken>
. . .
</VariableTokens>
<SeparatorTokens>`-=~!@#$%^&amp;()+[]\{}|;':&quot;,./&amp;gtlt;?&amp;#xaxd;&amp;#x9x20;&lt;/SeparatorTokensNumberTokens [fontstyle attributesw/bgcolorPlaceholderTokens [fontstyle attributesw/bgcolorPlaceholderStartMark>_{</PlaceholderStartMark>
<PlaceholderEndMark>}_</PlaceholderEndMark>
</PlaceholderTokens>
</Delimiters>
<WordsAndPhrases>
<WordGroup [type="some type name"] [fontstyle attributesKeyword [aliasattribute] [whiteSpaceattribute] >SELECT</Keyword>
. . .
</WordGroup>
. . .
</WordsAndPhrases>
</EditorContext>


If you compare the two fragments there are two types of problems:
(1) Several instances emanate from a right square bracket juxtaposed to a right angle bracket.
The first is on the line containing "CommentTokens".
(2) Entities: Examine the line of "Separator Tokens".

(If you want to see the finished HTML in real life, go to
http://cleancode.sourceforge.net/api/csharp/html/T_CleanCode_ChameleonRichTextBoxControls_ChameleonRichTextBox.htm
then scroll down to the piece of XML code labeled "Context Schema Summary")

Is there some encoding I need to apply to get a faithful reproduction of my input text?

Thanks,
~~Michael

Coordinator
Apr 14, 2010 at 2:53 AM

The colorizer uses regular expressions to parse the text based on the selected language.  Since your exampe isn't valid XML, the brackets cause it to parse the text incorrectly and thus you get the odd results.  The only way to fix it would be to not colorize it or modify the example so that the attributes are in a valid form but just placeholders.  For example: fontStyleAttribute="value (see list below)" or something like that.

Eric

Apr 14, 2010 at 5:28 PM

Thanks for the clarification--that resolves the issue for me!