I need to disply user-entered HTML on the screen. Needless to say, this has led to problems.
Is there any library that will strip out JavaScript, Input tags, etc. from HTML, leaving behind only reasonably safe tags?
Note: For reasons I can't control, I cannot filter out the garbage on input. I need to do it when the info is displayed.
Jonathan Allen
Hi,
you could probably make good use of the Anti Cross Site Scripting library:http://forums.asp.net/1107.aspx.
Grz, Kris.
Correct me if I'm wrong, but that doesn't look like it will work.
What I need to do is strip out only unsafe HTML. I have to be able to leave the other HTML behind. All the methods I saw in the docs are about encoding the output, which is the wrong thing to do.
Jonathan
I would attempt to do this using regular expressions. Are you familiar with regular expressions?
Regular expressions are not really an answer. Besides the fact that regular expressions are totally unsuitable for parsing something as complex as HTML, you still have to figure out for yourself what is safe and what isn't.
What I am looking for a library that understands the difference between safe and unsafe HTML, and is capable of filtering out the latter.
Other than javascript, what would be an example of some unsafe HTML?
Input tags for one.
What happens is that users email us HTML copied and pasted from other sites. Sometimes this email contains input tags. Especially troubling from a stability standpoint is when the input tag has the idea __viewstate. Needless to say, this breaks the real viewstate on the page.
Really any Form tag is also a risk. Though no likely in our case, one doesn't want users creating bogus forms that post their information to email or a web site out of your control.
Well, from what I can understand, your creating an even stronger case for using regular expressions. I feel I'm not completely understanding the scenario properly.
It sounds like you'll need to think about what you consider to be unsafe, and create a list. Armed with this, you'll need to create a list of expressions to match and remove (or replace with nothing).
I'm still not 100% on how/why you need to do this though? Can you talk me through an example - or is there a URL you can direct me too?
From what you've said, I'm assuming that people are filling in a form which emails you the code they entered (or something similar), and that's causing the problem? Perhaps you could try using the depricated <xmp> ... </xmp> tag? (strictly speaking you shouldn't though, I know.)
0 comments:
Post a Comment