Spirit tries hard to make dealing with attributes easy, but sometimes, it just gets in the way.
For larger Spirit-based project, organizing the source code well can lead to more efficient builds and increased maintainability.
Of course, this is true for any project. But the heavily templated nature of even a fully realized Spirit parser makes this doubly so. Figuring out how to take advantage of separate compilation while maintaining the ability for each of the pieces to see the needed type/template information is not trivial.
Once your parser grammar grows beyond a few rules/parsers, handling errors will become a priority. Being able to give feedback about where things went wrong, what exactly went wrong, and possible fixes are all things you would like to provide. It might also be nice to see if you could recover the parsing process from the point of failure and continue parsing to maybe find other problems.
Last time, we looked at the lexer and supporting staff. This time, we will look at the primitive parser and final usage.
Back in this post, I said about Spirit .. …it would be very feasible to write a lexical analyzer that makes a token object stream available via a ForwardIterator and write your grammar rules based on that. But is it ? really? The short answer is - Yes it is feasible, but probably not a good idea. The long answer is the journey we’ll take on the next two posts.
This time around, we will use a custom parser to handle the keywords. I really hadn’t planned on making this a series, but there you go. This will be the last - I think. Upgrades I started from the code from the last post, but did make a minor adjustment. I made underbar (’’) a valid character in an identifier. auto const ualnum = alnum | char('_'); auto const reserved = lexeme[symtab >> !
The ink hadn’t dried1 on my Identifier Parsing post when I realized that there was indeed a better way to handle multiple keywords. In that post I stated that a symbols<T> parser would not help because it suffered the same problem as lit(). Which is true. What I missed was that, of course, you could use the same trick with symbols as you did with lit() to make it work.
In Boost.Spirit X3, parsing identifiers is a bit tricky. If you are used to the distinction between lexical analysis and syntactical analysis (as I am), Spirit can take some getting used. Lexical analysis is done in the same grammar as the syntactical analysis. So the ubiquitous IDENT token type is now a grammar rule. To be sure, it doesn’t have to be this way. Spirit parsers work on iterator pairs, so it would be very feasible to write a lexical analyzer that makes a token object stream available via a ForwardIterator and write your grammar rules based on that.