kconfig: fix ambiguous grammar in terms of new lines
This commit decreases 8 shift/reduce conflicts.
A certain amount of grammatical ambiguity comes from how to reduce
excessive T_EOL tokens.
Let's take a look at the example code below:
1 config A
2 bool "a"
3
4 depends on B
5
6 config B
7 def_bool y
The line 3 is melt into "config_option_list", but the line 5 can be
either a part of "config_option_list" or "common_stmt" by itself.
Currently, the lexer converts '\n' to T_EOL verbatim. In Kconfig,
a new line works as a statement terminator, but new lines in empty
lines are not critical since empty lines (or lines that contain only
whitespaces/comments) are just no-op.
If the lexer simply discards no-op lines, the parser will not be
bothered by excessive T_EOL tokens.
Of course, this means we are shifting the complexity from the parser
to the lexer, but it is much easier than tackling on shift/reduce
conflicts.
I introduced the second stage lexer to tweak the behavior.
Discard T_EOL if the previous token is T_EOL or T_HELPTEXT.
Two T_EOL tokens in a row is meaningless. T_HELPTEXT is a special
token that is reduced without T_EOL.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
diff --git a/scripts/kconfig/zconf.l b/scripts/kconfig/zconf.l
index b7bc164..847ba42 100644
--- a/scripts/kconfig/zconf.l
+++ b/scripts/kconfig/zconf.l
@@ -16,6 +16,8 @@
#include "lkc.h"
+#define YY_DECL static int yylex1(void)
+
#define START_STRSIZE 16
static struct {
@@ -23,6 +25,7 @@
int lineno;
} current_pos;
+static int prev_token = T_EOL;
static char *text;
static int text_size, text_asize;
@@ -268,6 +271,24 @@
}
%%
+
+/* second stage lexer */
+int yylex(void)
+{
+ int token;
+
+repeat:
+ token = yylex1();
+
+ /* Do not pass unneeded T_EOL to the parser. */
+ if ((prev_token == T_EOL || prev_token == T_HELPTEXT) && token == T_EOL)
+ goto repeat;
+
+ prev_token = token;
+
+ return token;
+}
+
static char *expand_token(const char *in, size_t n)
{
char *out;