Tuesday, March 13, 2012

stuck on a REGEX (\S[^\s/>]*)

I'm trying to find the opening < and the text of a tag (without the
attributes or closing tags)

This is what I'm using:

(\S[^\s/>]*)

Which, I think, reads as:

(any number of non-whitespace characters [up to a space, /, or >])

Is that correct? I can't get it to work.

If my text is:

<tag

then it returns "<tag" which is what I want.

However, if I have:

<tag/ or <tag
it instead matches "/" or ">" respectively.

Why?darrel wrote:
> I'm trying to find the opening < and the text of a tag (without the
> attributes or closing tags)
> This is what I'm using:
> (\S[^\s/>]*)
> Which, I think, reads as:
> (any number of non-whitespace characters [up to a space, /, or >])
> Is that correct? I can't get it to work.
> If my text is:
> <tag
> then it returns "<tag" which is what I want.
> However, if I have:
> <tag/ or <tag>
> it instead matches "/" or ">" respectively.
> Why?

In my brief testing, when run against "<tag/" it first matches "<tag" -
then the next match is "/". The second match matches "/" because it
matches the \S character class.

Post some examples of how you want the regex to behave, and maybe
someone can help put one together.

--
mikeb
> In my brief testing, when run against "<tag/" it first matches "<tag" -
> then the next match is "/". The second match matches "/" because it
> matches the \S character class.

But shouldn't this: [^/] stop it from doing that?

Here's how I want the regex to behave:

I want to find the first 'word' in the string. this would be any number of
characters in a row up to (but not including) a space, a new line, or a / or

so in this:

"hello there, how are you"

it should match 'hello'

in this:

"<blockquote>hello there, how are you"

it should match '<blockquote'

Thanks!

-Darrel
> But shouldn't this: [^/] stop it from doing that?

Aha. Mike, you are correct!

Here's what's happening. If this is my text:

<blockquote>monkey</blockquote
and this is my Regex:

\S[^>]*

It returns these matches:

<blockquote
>monkey</blockquote

So, it's returning the last match, I suppose. This is where I get lost. How
do I get it to ONLY return the first match?
Got it!

The problem was the very next group I was using.

I had this:

(\S[^\s/>]*)
but had to add another group:
(\s|\n[^\S>]*)|(>))
which checks for whitespace/new lines OR a closing tag.
-Darrel
Use the Match Class of the regular expression object
Dim m as Match = yourRegEx.Match(string)
m will return the first match

"darrel" wrote:

> > But shouldn't this: [^/] stop it from doing that?
> Aha. Mike, you are correct!
> Here's what's happening. If this is my text:
> <blockquote>monkey</blockquote>
> and this is my Regex:
> \S[^>]*
> It returns these matches:
> <blockquote
> >monkey</blockquote
> So, it's returning the last match, I suppose. This is where I get lost. How
> do I get it to ONLY return the first match?
>
>

0 comments:

Post a Comment