Use "\A \z", not "^ $" with Python regular expressions

flufluflufluffy · 2026-01-28T17:05:11 1769619911

The vast majority of the times I use ^/$, I actually want the behavior of matching start/end of lines. If I had some multi-line text, and only wanted to update or do something with the actual beginning or end of the entire text, I’d typically just do it manually.

theamk · 2026-01-28T18:16:07 1769624167

A lot of time I want to check for valid identifier:

    if not re.match('^[a-z0-9_]+$', user):
        raise SomeException("invalid username")

as written, the code above is incorrect - it will happily accept "john\n", which can cause all sort of havoc down the line

Joker_vD · 2026-01-28T10:11:44 1769595104

Regular expressions as we basically now them today were made for ed. In that context, '$' absolutely had to match the terminating newline or it would've been completely useless.

seanwilson · 2026-01-28T17:54:43 1769622883

I wish one of those regex libraries that replaces the regex symbols with human readable words would become standard. Or they don't work well?

Regex is one of those things where I have to look up to remind myself what the symbols are, and by the time I need this info again I've forgotten it all.

I can't think of anywhere else in general programming where we have something so terse and symbol heavy.

db48x · 2026-01-28T18:10:04 1769623804

It’s been done. Emacs, for example, has rx notation. From the manual:

    35.3.3 The ‘rx’ Structured Regexp Notation
    ------------------------------------------
    
    As an alternative to the string-based syntax, Emacs provides the
    structured ‘rx’ notation based on Lisp S-expressions.  This notation is
    usually easier to read, write and maintain than regexp strings, and can
    be indented and commented freely.  It requires a conversion into string
    form since that is what regexp functions expect, but that conversion
    typically takes place during byte-compilation rather than when the Lisp
    code using the regexp is run.
    
       Here is an ‘rx’ regexp(1) that matches a block comment in the C
    programming language:
    
         (rx "/*"                    ; Initial /*
             (zero-or-more
              (or (not "*")          ;  Either non-*,
                  (seq "*"           ;  or * followed by
                       (not "/"))))  ;     non-/
             (one-or-more "*")       ; At least one star,
             "/")                    ; and the final /
    
    or, using shorter synonyms and written more compactly,
    
         (rx "/*"
             (* (| (not "*")
                   (: "*" (not "/"))))
             (+ "*") "/")
    
    In conventional string syntax, it would be written
    
         "/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/"

Of course, it does have one disadvantage. As the manual says:

       The ‘rx’ notation is mainly useful in Lisp code; it cannot be used in
    most interactive situations where a regexp is requested, such as when
    running ‘query-replace-regexp’ or in variable customization.

Raku also has advanced the state of the art considerably.

eviks · 2026-01-28T11:04:54 1769598294

so why \A instead of ^?

tkocmathla · 2026-01-28T12:47:21 1769604441

\A always matches the start of the string, but in multiline mode, ^ will match both the start of the string and the start of each line:

https://docs.python.org/3/library/re.html#re.MULTILINE

svilen_dobrev · 2026-01-28T16:00:19 1769616019

it's in the spec. Since forever, like v 1.3? don't remember.

And it is same in perl: from `man perlre`:

   ^   Match the beginning of the string  (or line, if /m is used)

autoexec · 2026-01-28T11:53:12 1769601192

I've said it before and I'll say it again, I'd like Python a lot more if it abandoned re and handled regex like perl did.

edflsafoiewq · 2026-01-28T19:28:13 1769628493

I've never used perl. What's the difference?

instig007 · 2026-01-28T18:45:51 1769625951

ABC: Always. Build on. Parser Combinators.

Python ecosystem has several options, for instance: https://parsy.readthedocs.io/en/latest/tutorial.html

az09mugen · 2026-01-28T12:22:24 1769602944

They could simply advise to use boundaries '\b' instead.

notpushkin · 2026-01-28T20:52:05 1769633525

Which would also match whitespace in addition to the \n they’re trying to avoid matching?