Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Parse] Statement typo corrections #28510

Closed

Conversation

kitaisreal
Copy link
Contributor

@kitaisreal kitaisreal commented Nov 30, 2019

Current implementation steps:

  1. If consecutive statement error occurred we are getting previous identifier, then trying to typo correct it to one of the statements keywords.
  2. Updated tests

Current implementation is limited to only identifiers that start from the beginning of the line. Like

gaurd x>5 else { return }

The limitation exists because it is ambigous how to handle case where

x=1+i i=i+1

Should we correct it to if statement, or provide insert of ';'.

https://bugs.swift.org/browse/SR-8510

@kitaisreal kitaisreal changed the title SR-8610: Parser statement typo corrections [PARSER] Statement typo corrections Nov 30, 2019
@kitaisreal kitaisreal changed the title [PARSER] Statement typo corrections [Parse] Statement typo corrections Nov 30, 2019
///// \returns the score, if it is good enough to even consider this a match.
///// Otherwise, an empty optional.
/////
static Optional<unsigned> scoreIdentifierAndStmtNameTypo(StringRef identifier,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are identical function used for matching stringRefs in CSSimplify.cpp. Can it be moved to StringExtras?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can get away with a table of keywords and a loop checking edit_distance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Fixed.

@@ -925,6 +925,8 @@ ERROR(statement_begins_with_closure,none,
"top-level statement cannot begin with a closure expression", ())
ERROR(statement_same_line_without_semi,none,
"consecutive statements on a line must be separated by ';'", ())
ERROR(statement_misspell,none,
"did you misspell '%0'", (StringRef))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits:

  1. The phrasing here is of a note, but the diagnostic is an error. Take a look at how we do typo correction diagnostics in libSema.
  2. Rhetorical questions need punctuation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Fixed.

let test = 5
gaurd test > 5 else { return } // expected-error {{did you misspell 'guard'}} {{3-8=guard}} expected-error {{use of unresolved identifier 'gaurd'}}
i test > 5 { return } // expected-error {{did you misspell 'if'}} {{3-4=if}} expected-error {{use of unresolved identifier 'i'}}
whle test > 5 { return } // expected-error {{did you misspell 'while'}} {{3-7=while}} expected-error {{use of unresolved identifier 'whle'}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we recover better here? (Would it make sense to use the nearest-neighbor match to start parsing the corresponding statement kind?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed. Please check my comment below.

}
if (stmtTokMatch) {
ParserPosition parserPosition = ParserPosition(L->getStateForBeginningOfToken(previousToken), PreviousLoc);
backtrackToPosition(parserPosition);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think you have to backtrack. You only need the location of the typo’d keyword.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I used backtrack to update previous token with our corrected token kind and continue parsing as there were no error.
Maybe there are better way to recover there for example just skip tokens until we met end of statement.
Do we have any other options @rintaro ?

Copy link
Member

@rintaro rintaro Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately,setKind() to a keyword kind is not not an option because we have an invariant in libSyntax where keyword tokens have always have the fixed text.

An option (which @CodaFi is suggesting) is to make parseStmt*** functions receive keyword location like

ParserResult<Stmt> Parser::parseStmtReturn(SourceLoc tryLoc, SourceLoc ReturnLoc) {
  if (ReturnLoc.isInvalid())
    ReturnLoc = consumeToken(tok::kw_return);

  ...
}

Then directly call them from this recovery logic with the location. But that's probably not simple changes considering TopLevelCodeDecl handling.

So, I recommend to skip to the next statement, decl, or '}' using skipUntilDeclStmtRBrace(). Downside of this is that we loose the diagnostics and semantic syntax highlighting until the next statement. But I think that's better than lousy diagnostics.

Copy link
Contributor Author

@kitaisreal kitaisreal Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, totally agree. I also thought that approach with backtracking and setting kind is too unnatural. Related to skipUntilDeclStmtRBrace(). If we start using such diagnostics there are case like this.

func bar() {
    let i = 5
    gaurd i > 5 else { return }
    gaurd i > 5 else { return }
}

After first missed guard I want to skip until rBrace or next stmt. But currently in skipSingle I find out that we are skipping to rBrace when we find out lBrace without actually going through checks in skipUntilDeclStmtRBrace().

void Parser::skipSingle() {
  switch (Tok.getKind()) {
...
  case tok::l_brace:
    consumeToken();
    skipUntil(tok::r_brace);
    consumeIf(tok::r_brace);
    break;
void Parser::skipUntilDeclStmtRBrace(tok T1) {
  /// here Tok could not be '}' because we are skipping it in skipSingle()
  while (Tok.isNot(T1, tok::eof, tok::r_brace, tok::pound_endif,
                   tok::pound_else, tok::pound_elseif,
                   tok::code_complete) &&
         !isStartOfStmt() && !isStartOfDecl()) {
    skipSingle();
  }
}

And in case like I posted from above gaurd is not a startOfStmt and we are just skipping to the end of the function.
We can make 2 variants:

  1. Introduce function possibleStartOfStmt() and update skipUntilDeclStmtRBrace.
  2. Just create other skip function with correct handling rBrace.
    What approach is more suitable @rintaro ?

Copy link
Member

@rintaro rintaro Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the implementation of skipSingle() is not wrong. This is for example:

func foo() {
  if /*HERE*/ foo {
    return 1
  }
  return 2
}

When skipping from /*HERE*/ to the next statement, we don't care about the contents of the if statement. We just want to skip to return 2 at line 5.

Introduce function possibleStartOfStmt() and update skipUntilDeclStmtRBrace.

Could you elaborate on this? How will possibleStartOfStmt() be different from isStartOfStmt()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, agree. Lets keep it simple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @rintaro after investigation I find out there simple usage of skipUntilDeclStmtRBrace() is not sufficient for cases like

// SR-8510
func foo() -> Int? { return 0 }
func bar() {
  gaurd /*HERE*/ let x = foo() else { return }  // expected-error {{consecutive statements}} // expected-note {{did you misspell 'guard'}} {{3-8=guard}}
  print(x)
}

We will stop here and then emit a bunch of unrelated errors. Are there any workarounds around this, if we are trying to use skipUntilDeclStmtRBrace() ?
Should we consider introducing another skipping function ?

void Parser::skipUntilStmtRBrace() {
  while (Tok.isNot(tok::eof, tok::r_brace, tok::pound_endif,
                 tok::pound_else, tok::pound_elseif,
                 tok::code_complete) &&
       !isStartOfStmt())
  skipSingle();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that's unfortunate. I think that is a general skipUntilDeclStmtRBrace() problem where it cannot skip statement conditions with optional binding syntax. I noticed that actually causes spurious diagnostics:

var foo: Int? 
if
  a == ],
  let b = foo {
}
test.swift:4:8: error: expected expression after operator
  a == ],
       ^
test.swift:5:11: error: cannot call value of non-function type 'Int?'
  let b = foo {
          ^

Hmm...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, how about just skipping the first Tok.isAny(tok::kw_let, tok::kw_var) in this recovery logic? like:

if (Tok.isAny(tok::kw_let, tok::kw_var))
  consumeToken();
skipUntilDeclStmtRBrace()

It's not great for something like:

gurad
  let a = someOpt1,
  let b = someOpt2
else {
}

But I think supporting single let/var is good enough because this is just a recovery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @rintaro updated pull request. Currently recovery logic will work only if next token is let/var and wrong typed statement is on the beginning of the line. Also updated tests.

@CodaFi
Copy link
Member

CodaFi commented Nov 30, 2019

I left some high-level comments about the approach here. I think @rintaro is more qualified than I am to get into the finer details.

Thanks for tackling this. It’s a tough nut to crack.

@CodaFi CodaFi requested a review from rintaro November 30, 2019 20:54
@@ -0,0 +1,13 @@
// RUN: %target-typecheck-verify-swift

func statementMisspelledAtTheBegginingOfTheLine() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the functions from the SR too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Added.

func statementMisspelledNotAtTheBegginingOfTheLine() {
let i = 5
var x = i x=x+1 // expected-error {{consecutive statements}}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Trailing newline please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@kitaisreal kitaisreal force-pushed the parser-statement-typo-corrections branch from 84a18c6 to 9870352 Compare December 1, 2019 11:53
}
}
if (stmtTokMatch) {
Entries.pop_back();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are removing latest entry, because it is unresolved identifier expression from parse expression. If we will not remove it undefined identifier error will be emitted by Sema.

}
if (stmtTokMatch) {
ParserPosition parserPosition = ParserPosition(L->getStateForBeginningOfToken(previousToken), PreviousLoc);
backtrackToPosition(parserPosition);
Copy link
Member

@rintaro rintaro Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately,setKind() to a keyword kind is not not an option because we have an invariant in libSyntax where keyword tokens have always have the fixed text.

An option (which @CodaFi is suggesting) is to make parseStmt*** functions receive keyword location like

ParserResult<Stmt> Parser::parseStmtReturn(SourceLoc tryLoc, SourceLoc ReturnLoc) {
  if (ReturnLoc.isInvalid())
    ReturnLoc = consumeToken(tok::kw_return);

  ...
}

Then directly call them from this recovery logic with the location. But that's probably not simple changes considering TopLevelCodeDecl handling.

So, I recommend to skip to the next statement, decl, or '}' using skipUntilDeclStmtRBrace(). Downside of this is that we loose the diagnostics and semantic syntax highlighting until the next statement. But I think that's better than lousy diagnostics.

@@ -2529,3 +2548,28 @@ ParserResult<Stmt> Parser::parseStmtPoundAssert() {
return makeParserResult<Stmt>(new (Context) PoundAssertStmt(
SourceRange(startLoc, endLoc), conditionExprResult.get(), message));
}

Optional<tok> Parser::closestMatchingStmtTokenForIdentifier(StringRef identifier) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make it a Parser method? If not, please make this a static function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, fixed

lib/Parse/ParseStmt.cpp Outdated Show resolved Hide resolved
@kitaisreal kitaisreal force-pushed the parser-statement-typo-corrections branch from 916866b to 346bf2a Compare January 11, 2020 15:42
Comment on lines 386 to 391
if (stmtTokMatch && Tok.isAny(tok::kw_let, tok::kw_var)) {
StringRef stmtTokenText = getTokenText(*stmtTokMatch);
diagnose(PreviousLoc, diag::statement_misspell, stmtTokenText)
.fixItReplace(PreviousLoc, stmtTokenText);
consumeToken();
skipUntilDeclStmtRBrace(tok::r_brace);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I meant something like:

if (stmtTokMatch) {
  ... diagnose ...
  if (Tok.isAny(tok::kw_let, tok::kw_var))
    consumeToken();
  skipUntilDeclStmtRBrace(tok::r_brance);
}

We still can recover non let/var binding cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rintaro Sure. Fixed.

@rintaro
Copy link
Member

rintaro commented Jan 17, 2020

Sorry for the delay. Could you add back test cases you deleted in previous commit?

func statementMisspelledAtTheBegginingOfTheLine() {
   let test = 5
   gaurd test > 5 else { return } // expected-error {{consecutive statements}} // expected-note {{did you misspell 'guard'}} {{3-8=guard}}
   i test > 5 { return } // expected-error {{consecutive statements}} // expected-note {{did you misspell 'if'}} {{3-4=if}}
   whle test > 5 { return } // expected-error {{consecutive statements}} // expected-note {{did you misspell 'while'}} {{3-7=while}}
 }

 func statementMisspelledNotAtTheBegginingOfTheLine() {
   let i = 5
   var x = i x=x+1 // expected-error {{consecutive statements}}
 }

These should work now.

Copy link
Member

@rintaro rintaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good!
For code-formatting, I recommend clang-format with git-clang-format.

@@ -870,7 +870,7 @@ class Parser {
IsFollowingGuard);
}
ParserResult<BraceStmt> parseBraceItemList(Diag<> ID);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there was trailing white space and xCode automatically trimmed it.

gaurd let firstWithLet = foo() else { return } // expected-error {{consecutive statements}}
// expected-note@-1 {{did you misspell 'guard'}} {{3-8=guard}}
// expected-error@-2 {{use of unresolved identifier 'gaurd'}}
i let secondWithLet = foo() { return } // expected-error {{consecutive statements}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the previous recovery skips i here. Because, use of unresolved identifier 'I' is not emitted.
Could you split test cases to their own functions? like:

func stmtMisspelledAtTheBegginingOfTheLineWithLet1() {
   gaurd let firstWithLet = foo() else { return }
}
func stmtMisspelledAtTheBegginingOfTheLineWithLet2() {
   i let secondWithLet = foo() { return }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Fixed.

std::pair<StringRef, tok> pair = stmtKeywords[i];
StringRef potentialStatement = pair.first;

unsigned dist = potentialStatement.edit_distance(identifier, /*AllowReplacements=*/true, /*MaxEditDistance=*/bestScore);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80 columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 380 to 383
SourceLoc startOfLineLocation = L->getLocForStartOfLine(SourceMgr, previousToken.getLoc());
SourceLoc locationOfTokenAtStartOfLine = Lexer::getTokenAtLocation(SourceMgr, startOfLineLocation).getLoc();
if (previousToken.getLoc() == locationOfTokenAtStartOfLine) {
stmtTokMatch = closestMatchingStmtTokenForIdentifier(previousToken.getText());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80 columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -194,6 +194,32 @@ static bool isAtStartOfSwitchCase(Parser &parser,
return parser.Tok.isAny(tok::kw_case, tok::kw_default);
}


static Optional<tok> closestMatchingStmtTokenForIdentifier(StringRef identifier) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80 columns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Updated formatting.
Split tests for different statements.
@kitaisreal
Copy link
Contributor Author

Hi, @rintaro, what is status of this ticket ? Should be consider some different implementation ?

@kitaisreal kitaisreal requested a review from CodaFi March 23, 2020 11:58
@shahmishal
Copy link
Member

Please update the base branch to main by Oct 5th otherwise the pull request will be closed automatically.

  • How to change the base branch: (Link)
  • More detail about the branch update: (Link)

@shahmishal shahmishal closed this Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants