도전2022

Keyword Highlighting 본문

소스코드

Keyword Highlighting

hotdigi 2014. 1. 7. 06:52

http://www.codeproject.com/Articles/275320/Keyword-Highlighting-with-One-Line-of-Code-Applied



Keyword Highlighting with One Line of Code: Applied Use of HttpResponse.Filter in ASP.NET to Modify the Output Stream

By 3 Jan 2014

Introduction

No doubt you have seen many web pages in which the results of a keyword-search highlights the keyword in yellow, making it easy for the reader to find the keyword in the context in which it was found. There are of course many ways to approach this task.

This article discusses:

  • Implementation of the (mostly) undocumented HttpResponse.Filter property
  • Implementation of a simple search box to highlight a word or phrase on a page
  • Use of Regex.Replace with a MatchEvaluator delegate

Background

This week when I approached the implementation of keyword highlighting, I considered a few possible ways:

  1. Client-side DOM manipulation with JavaScript
  2. Search and replace on the text to which I have programmatic access
  3. An ASP.NET HTTP Module or HTTP Handler, compiled as a standalone assembly and installed in Web.config
  4. Manipulating the output stream, similar to output buffering in PHP

It was the last method that I decided to pursue, because it had the potential to operate independently of the page's code (unlike #2), wouldn't require processor-intensive client-scripting (unlike #1), and wouldn't require any server-side configuration (unlike #3).

The example site consists of a web page that displays the text from Charles Dickens' Great Expectations. In the upper-right corner of the page floats a search box into which you can enter a word or phrase. It also presents some options, such as case-sensitive searching, whole-word searching, and searching using regular expressions instead of literal text.

Screen shot of Great Expectations without highlight

When a word or phrase is entered into the search box and the button clicked, the page is shown again with the search term highlighted throughout the document.

Screen shot of Great Expectations with highlighting

Terminology

For the sake of clarity, I'll refer to the search term or keywords as the needle. Likewise, I'll refer to the text that is being searched as the haystack. This nomenclature is also used throughout the code for consistency.

Using the Code

Screen shot of Great Expectations with highlighting

Earlier in the article, I promised to add highlighting to a page with one line of code. Here is the code in context:

/// <summary>
/// Handles the Load event of the Page control.
/// </summary>
/// <param name="sender">The source of the event.</param>
/// <param name="e">The <see cref="EventArgs"/> instance containing the event data.
/// </param>
protected void Page_Load(object sender, EventArgs e)
{
    // Add some content from a resource.
    Content.Text = Properties.Resources.Great_Expectations__by_Charles_Dickens;

    if(IsPostBack)
    {
        // Implement a highlighter with one line of code:
        Response.Filter = new HighlightFilter(Response, Needle.Text)    // The magic line.
                                {
                                    IsHtml5 = false, 
                                    MatchCase = MatchCase.Checked, 
                                    MatchWholeWords = MatchWholeWords.Checked, 
                                    UseRegex = UseRegularExpressions.Checked
                                }; 

        // Don't try to highlight the search box.
        Needle.Text = string.Empty;
    }
} 

As you can see, when the Web Form is posted back, the needle is retrieved from Needle.Text. In the code-behind, we construct a HighlightFilter, passing it the HttpResponse object and the needle.

I have also set some of the properties of HighlightFilter using an object initializer. Most of the properties should be self-explanatory, like MatchCaseMatchWholeWord, and UseRegex.

The IsHtml5 property wraps instances of the needle in the <mark> element, for which it was intended. If it is false, adiv with its class set to "highlight" is used instead. For greater control, one can explicitly set the values of theOpenTag and CloseTag properties. For ultimate control, you can subscribe to the Highlighting event and modify the supplied Haystack using the supplied Needle, or even subclass HighlightFilter entirely.

Of course, the usefulness of post-processing in this manner need not be limited to highlighting. Using the Filterclass, one could subscribe to the Filtering event to modify the output stream, or subclass Filter and override the protected OnFilter method. There are numerous applications including:

  • obfuscation
  • minification
  • altering the output of sealed classes
  • translation (e.g. RSS ? HTML)
  • insertion of common code (e.g. reverse master page)

If you find other uses, please share with a comment.

How It Works

I would need to somehow intercept the output stream, Page.Response.OutputStream.

A bit of searching led me to the Filter property of the HttpResponse class. The documentation for the propertyleaves quite a bit to the imagination. The property is assigned a Stream that filters writes, and the example refers to a magical (i.e. undocumented) UpperCaseFilterStream that takes the property itself as a parameter to the constructor, and ta da! Hmm… (Had I bothered to find and unpack Samples.AspNet.CS.Controls maybe I would have solved this one.

I created the Filter class, which takes the HttpResponse object as a parameter to the constructor. The class itself inherits Stream, but the implementation of the abstract class simply invokes methods and properties of theHttpResponse object's OutputStream stream, with the exception of Write(byte[] buffer, int offset, int count). The overridden Write method decodes the buffer to a string using the response's ContentEncoding, applies a filter, and re-encodes and writes out the buffer to the OutputStream.

The Filter class by itself doesn't do anything useful, but its potential is unlimited. To make it filter something, one needs to subclass it and override OnFilter, or instantiate it and subscribe to the Filtering event, which passes aFilterEventArgs object containing the buffered string to be manipulated.

For example, to implement needle highlighting, HighlightFilter inherits Filter, overriding OnFilter and adding some properties and the Highlighting event.

The new OnFilter method uses Regex.Replace to replace instances of the needle in the haystack. It does this using the invocation that takes a MatchEvaluator, a delegate that is called for each match that is found. This is perfect for this use because if MatchWholeWords is true, the characters that bound the needle will be replaced in kind, and the case of the match will not be altered (i.e. using String.Replace would replace the casing of all matches with that of the needle.

If UseRegex is false, the needle is simply escaped with Regex.Escape instead of using an alternate means of searching and replacing.

I was initially concerned that using Regex for replacement with a MatchEvaluator would be prohibitively slow, but replacement of common words in Great Expectations (just over one megabyte) takes a few millisecond on my Core i7-2600K and hopefully not too much more on a typical web server. Interestingly, enabling "Match Whole Word", increases this to several seconds.

Points of Interest

In my first attempt, I derived a new class from MemoryStream and assigned it to the Filter property. I overrode theWrite method and manipulated it by wrapping instances of the keyword in a new element to which as CSS style could be assigned.

Inspection of the contents of the stream demonstrated that it worked quite nicely, and the class called base.Writeto complete the task, but this resulted in zero bytes sent to the client. The sample application suggests maybe one needs to write out the bytes individually. Instead, I used my class to wrap the output stream.

Acknowledgements

Thank you to The Gutenberg Project for the free distribution of Great Expectations and over 36,000 other works; and of course to Charles Dickens (1812-1870) himself.

History

  • October 31, 2011: Version 1.0.0.x
  • January 3, 2013: Modified title to better describe the nature of the topic

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Yvan Rodrigues
President Red Cell Innovation Inc. 
Canada Canada
Yvan Rodrigues has 25 years of experience in information systems and software development for the manufacturing sector. He runs Red Cell Innovation Inc./L'innovation de Globules Rouges, a consulting company focusing on efficiency and automation of manufacturing and business processes for small businesses, healthcare, and the public sector. He is a Certified Technician (C.Tech.), a professional certification granted by the Institute of Engineering Technology of Ontario (IETO).
 
Yvan draws on experience at Mabel's Labels Inc. as Manager of Systems and Development, and the University of Waterloo as Information Systems Manager.
 
Yvan supports open-source software. He is a committer for SharpKit (C# to Javascript cross-compiler) and WebIssues (Issue/Ticket Management System), and contributes to MySQL, Ghostscript, iTextSharp, Bacula, FreeBSD, MonoTouch, and Mono for Android.
 
Yvan's consumer-focused apps can be found in the Windows Store, Apple App Store, and Google Play marketplace.
Follow on   Google+ 


'소스코드' 카테고리의 다른 글

Panorama 360  (0) 2014.01.07
Lookup Search - ComboBox Multicolumn, Multiselection  (0) 2014.01.07
bytescout  (0) 2013.01.05
DirectShow Filters  (0) 2012.09.22
NSIS 한글 셈플  (0) 2012.09.20