Range expressions: discrepancies between rustc and parser-lalr
I'd guess that the last test case here is a rustc bug, but I don't know about the others.
fn match_rangefull() {
match .. { _ => () } // rustc compiles it, parser-lalr rejects
}
fn field_of_range() {
123.. .start; // rustc rejects, parser-lalr accepts
}
fn eq_rangefull() {
.. == ..; // rustc rejects, parser-lalr accepts
}
fn return_rangeto() -> std::ops::RangeTo<i32> {
return ..5; // rustc compiles it, parser-lalr parses it as (return)..5
}
fn assign_range() {
let x;
x = 4..5; // rustc compiles it, parser-lalr parses it as (x = 4)..5
}
fn nonblock_expr_range() {
// rustc compiles this, and I don't think it should. I think parser-lalr
// gets it right.
//
// rustc parses it like: let _one = (1..).start;
// parser-lalr parses it like: let _one = {1; ..}.start;
//
let _one = { {1} .. }.start;
}
fn main() {}
Also see https://github.com/rust-lang/rust/issues/25119.
Looking through https://github.com/rust-lang/rust/issues/20811, I see two more cases where
rustcandparser-lalrdisagree:fn rangefrom_compare() { 1.. == (1..); // rustc syntax error, parser-lalr parses as (1..) == (1..) } fn compare_rangeto() { (..1) == ..1; // rustc syntax error, parser-lalr parses as (..1) == (..1) }Ryan Prichard at 2015-10-01 06:15:39
triage: P-medium
This is a backwards incompat issue, but feels like an edge case. Nonetheless we should definitely fix it!
Niko Matsakis at 2015-10-01 21:38:09
Can you explain why you think that
{ {1} .. }.startshould parse as{1; ..}.start? I'm not familiar with the details of Rust's semicolon insertion.I would expect
{}within an expression to work pretty much wherever()does (except in if conditions / match expressions or similar, where{}starts the control flow block), and((1)..).startis definitely valid syntax.On all other examples, I'd say the rustc behavior is correct. The decision in #21374 was:
..is non-associative has the same precedence as the assignment operator. The unary forms of..are only valid at the same precedence level as the binary form.Comparing ranges needs parentheses: the attempt to give
..a higher precedence than the comparison operators in #20811 was not adopted because it caused new ambiguities between unary and binary&&, e.g. inr == 1.. && condition.Admittedly, having
..at the same precedence level as assignment is somewhat weird, as the operators on that level have different associativity (assignment is right-associative;..is non-associative). It would be better to put..on its own level directly above the assignment operator and placement-new-arrow.Daniel Grunwald at 2015-11-04 14:10:01
Regarding
{ {1} .. }.start, my reasoning was based on the idea that a "statement-like" expression in statement position cannot start a binary expression. e.g.let x = { {} - 1 };parses and type-checks because it's the same thing as:let x = { {} -1 }(Rust ignores newlines outside of string-literals, AFAIK.)
let x = { () - 1 };doesn't type-check.@nikomatsakis made some comments to this effect here:
- https://github.com/rust-lang/rust/issues/28379#issuecomment-144505499
- https://github.com/rust-lang/rust/issues/28379#issuecomment-144542236
Of course,
1..is a unary expression, not a binary expression.FWIW, I noticed that the current nightly Rust compiler is now parsing
{ {1} .. }.startdifferently -- the..is parsed as a nullary operator (i.e. as-if there were a semicolon after the{1}).Ryan Prichard at 2015-11-05 03:59:08
That makes sense; I didn't realize we had that rule about
{}not starting binary expressions.It also fits what I wrote in #25119:
..is only accepted as expression in locations wherea..bwould also be accepted.{ {1} a..b }parses as if there's a semicolon after the{1}, so the same should apply to{ { 1 } .. }.In a formal grammar, the restrictions around
..are easily expressed by using a production for every precedence level:asgn_expr : range_expr | range_expr "=" asgn_expr ; range_expr : logic_or_expr | logic_or_expr ".." logic_or_expr | logic_or_expr ".." | ".." logic_or_expr | ".." logic_or_expr : logic_and_expr | logic_or_expr "||" logic_and_expr ;Is there any way to do this with the
%precedencemechanism? A production for each precedence level would bloat the grammar quite a bit, given that they get multiplied with the restrictions (normal, nonblock, nonparen, nostruct).Daniel Grunwald at 2015-11-05 16:21:11
I think we can reduce the problem to:
expr : DOTDOT | expr DOT ID | ID {} ;We need to reject
DOTDOT DOT ID. My full test case is here. Menhir and yacc both think my precedence declarations are useless. I suspect the precedence mechanism doesn't do what we want.(Aside: I'm finding
menhir --interpret --interpret-show-cstuseful for playing with grammars. You can typeDOTDOT DOT IDand see what CST is parsed, or whether it's rejected.)Some parser generators (e.g. menhir) allowing parameterizing rules. e.g. Maybe a formal grammar could have a single
range_exprrule that is parameterized over each of the restrictions.(Also: I get the impression that the nonparen restriction class might be going away. It's needed only for the placement new
box (xxx) yyysyntax, which has changed tobox in xxx { yyy }.)Ryan Prichard at 2015-11-05 17:58:40
Actually, I'm starting to think that the rustc behavior might not be ideal: it interacts badly with lambda syntax.
Rust currently has 7 non-associative operators:
..and the 6 comparison operators. But within lambdas, these behave differently!Rustc currently rejects
|a, b| a == b == cas a syntax error (cannot chain comparison operators). But it parses|a, b| a .. b .. csuccessfully, as(|a, b| (a .. b)) .. c.Lambdas have the same "precedence inversion" problem that I tried to avoid for (unary)
..: they can appear within operators with high precedence, but contain an arbitrary expression with lower precedence.For example,
a * || b + cis a syntactically valid expression, parsing asa * (|| (b + c)). A programmer who knows about the precedence of*and+but not about the effect of lambdas would probably expect(a * (|| b)) + cinstead.This gets especially problematic when the programmer assumes the lambda body is an expression, but the expression actually ends earlier. Is
|a, b| a .. b .. cthe only case of a valid expression that "breaks out" of the lambda?One solution would be to parse
..as parser-larl does, and then later reject syntax trees where..appears as operand of another operator with higher precedence. (similar to how chained comparisons are being rejected)The other option would be to disallow lambdas from appearing as left-hand-operand of a binary operator (just like we already disallow blocks in such a position).
Daniel Grunwald at 2015-11-08 13:36:46
|a, b| a .. b .. cisn't the only case, I found another one. The following is a valid rust program, but probably shouldn't be:fn main() { || println!("Hello World") as ()() }I think we should disallow lambdas from appearing as left-hand-operand of a binary operator or as operand of a suffix expression.
Daniel Grunwald at 2015-11-08 13:46:17
On Thu, Nov 05, 2015 at 09:59:14AM -0800, Ryan Prichard wrote:
Some parser generators (e.g. menhir) allowing parameterizing rules. e.g. Maybe a formal grammar could have a single
range_exprrule that is parameterized over each of the restrictions.Just wanted to point out that LALRPOP supports this sort of parameterization (and generates Rust). At some point I want to go ahead and port one of these LALR Rust grammars to LALRPOP.
Niko Matsakis at 2015-11-09 19:26:45
@nikomatsakis can mentor. Needs investigation into the correct grammar.
Brian Anderson at 2016-08-04 16:25:41
@nikomatsakis Could you either provide some instructions here, or otherwise unmark as E-mentor?
Mark Rousskov at 2017-05-10 19:35:30
Triage; is parser-lalr still a thing? is this issue relevant?
also, removing e-mentor
Steve Klabnik at 2018-10-31 16:19:45
Apparently
|| println!("Hello World") as ()()was fixed at some point by the introduction of a "casts cannot be followed by a function call" error, but the issues with the interaction of..and lambdas still exist.Daniel Grunwald at 2021-12-30 14:22:52
- Yep: https://github.com/rust-lang/rust/pull/68985
David Tolnay at 2021-12-30 20:29:17