Regular expressions as we basically now them today were made for ed. In that context, '$' absolutely had to match the terminating newline or it would've been completely useless.
flufluflufluffy 4 days ago [-]
The vast majority of the times I use ^/$, I actually want the behavior of matching start/end of lines. If I had some multi-line text, and only wanted to update or do something with the actual beginning or end of the entire text, I’d typically just do it manually.
theamk 4 days ago [-]
A lot of time I want to check for valid identifier:
if not re.match('^[a-z0-9_]+$', user):
raise SomeException("invalid username")
as written, the code above is incorrect - it will happily accept "john\n", which can cause all sort of havoc down the line
extraduder_ire 4 days ago [-]
Shouldn't you use the match returned from the string? Or use .fullmatch() (added 3.4) to match the whole string.
theamk 4 days ago [-]
In general no, you should not use match from the string. If you are getting input from user, you want a more complex processing (like stripping all whitespace), and if you are getting input from API calls, you want to either use specified name as-is, or fail.
Yes, fullmatch() will help, and so will \Z. It's just that it is so easy to forget...
seanwilson 4 days ago [-]
I wish one of those regex libraries that replaces the regex symbols with human readable words would become standard. Or they don't work well?
Regex is one of those things where I have to look up to remind myself what the symbols are, and by the time I need this info again I've forgotten it all.
I can't think of anywhere else in general programming where we have something so terse and symbol heavy.
db48x 4 days ago [-]
It’s been done. Emacs, for example, has rx notation. From the manual:
35.3.3 The ‘rx’ Structured Regexp Notation
------------------------------------------
As an alternative to the string-based syntax, Emacs provides the
structured ‘rx’ notation based on Lisp S-expressions. This notation is
usually easier to read, write and maintain than regexp strings, and can
be indented and commented freely. It requires a conversion into string
form since that is what regexp functions expect, but that conversion
typically takes place during byte-compilation rather than when the Lisp
code using the regexp is run.
Here is an ‘rx’ regexp(1) that matches a block comment in the C
programming language:
(rx "/*" ; Initial /*
(zero-or-more
(or (not "*") ; Either non-*,
(seq "*" ; or * followed by
(not "/")))) ; non-/
(one-or-more "*") ; At least one star,
"/") ; and the final /
or, using shorter synonyms and written more compactly,
(rx "/*"
(* (| (not "*")
(: "*" (not "/"))))
(+ "*") "/")
In conventional string syntax, it would be written
"/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/"
Of course, it does have one disadvantage. As the manual says:
The ‘rx’ notation is mainly useful in Lisp code; it cannot be used in
most interactive situations where a regexp is requested, such as when
running ‘query-replace-regexp’ or in variable customization.
Raku also has advanced the state of the art considerably.
Yes, fullmatch() will help, and so will \Z. It's just that it is so easy to forget...
Regex is one of those things where I have to look up to remind myself what the symbols are, and by the time I need this info again I've forgotten it all.
I can't think of anywhere else in general programming where we have something so terse and symbol heavy.
Python ecosystem has several options, for instance: https://parsy.readthedocs.io/en/latest/tutorial.html
* running a regex not in multi-line mode
* on input that was presumably split from multiple lines, or within a line of multi-line input
* wherein I care whether the line in question is the last line of input without a trailing newline
* but I didn't check, or `.strip()` or anything
I can't say I recall ever being bitten by this.
And there is also nothing here to justify \A over ^.
https://docs.python.org/3/library/re.html#re.MULTILINE
And it is same in perl: from `man perlre`:
In this particular case the default is that $ matches the end of a string without a newline but you can include it anytime you need to:
https://www.reuters.com/world/us/evidence-contradicts-trump-...
Please don't follow people around the site to continue political arguments from unrelated threads.
https://www.reuters.com/world/us/evidence-contradicts-trump-...