This issue relates to Unicode escapes, described in section 3.3 of the JLS. javac interprets Unicode escapes during the reading of ASCII characters from source. Later on, javac interprets escape sequences, described in section 3.7 of the JLS, during the tokenization of character literals, string literals, and text blocks. Escape sequences are only indirectly affected by this bug.
During reading, a normal backslash (that is, the ASCII \
character, not the corresponding Unicode escape \u005c
) followed by another normal backslash is treated collectively as a pair of backslash characters. No further interpretation is done. This means that if a normal backslash immediately precedes the sequence \
u
A
B
C
D
which would "normally" be interpreted as an Unicode escape, then the interpretation of that sequence as a Unicode escape is suppressed.
For example, the sequence \u2022
would be interpreted as the •
character, whereas \\u2022
would be interpreted as the seven characters \
\
u
2
0
2
2
.
An issue arises when Java developers choose to use a Unicode escape backslash \u005c
in their source code, instead of a normal backslash. Prior to JDK 16, if the Unicode escape backslash was followed by a second Unicode escape, then the second Unicode escape was always interpreted. The normal backslash at the beginning of the second Unicode escape (immediately followed by u
) was not paired with the preceding Unicode escape backslash. Elsewise, any following normal backslash will be paired with the \u005c
.
For example, the sequence \u005c\u2022
would be interpreted as \
and •
, whereas \u005c\tXYZ
would be interpreted as \
\
t
X
Y
Z
.
The bug in JDK 16 ignored \u005c
as having any effect on Unicode interpretation. Using the example from compiler-dev discussions, \u005c\\u005d
:
- Prior to JDK 16, it was interpreted as
\
\
]
- JDK 16 interpreted it as
\
\
\
u
0
0
5
d
which would produce a syntax error downstream in the lexer because the escape sequence \u
is invalid.
Progress
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
- [x] Change must be properly reviewed
Issue
- JDK-8269150: UnicodeReader not translating \u005c\u005d to \]
Reviewers
- Jonathan Gibbons (@jonathan-gibbons - Reviewer) ⚠️ Review applies to 67efad992c03de1b7d3556e6b4df53f0ce2662b1
- Jan Lahoda (@lahodaj - Reviewer) ⚠️ Review applies to 67efad992c03de1b7d3556e6b4df53f0ce2662b1
- Joe Darcy (@jddarcy - Reviewer)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk17 pull/126/head:pull/126
$ git checkout pull/126
Update a local copy of the PR:
$ git checkout pull/126
$ git pull https://git.openjdk.java.net/jdk17 pull/126/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 126
View PR using the GUI difftool:
$ git pr show -t 126
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk17/pull/126.diff
integrated compiler