The Boost C++ Libraries

Parsers

This section explains how you define parsers. You usually access existing parsers from Boost.Spirit – for example, boost::spirit::ascii::digit or boost::spirit::ascii::space. By combining parsers, you can parse more complex formats. The process is similar to defining regular expressions, which are also built from basic building blocks.

Example 11.5. A parser for two consecutive digits
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <iostream>

using namespace boost::spirit;

int main()
{
  std::string s;
  std::getline(std::cin, s);
  auto it = s.begin();
  bool match = qi::phrase_parse(it, s.end(), ascii::digit >> ascii::digit,
    ascii::space);
  std::cout << std::boolalpha << match << '\n';
  if (it != s.end())
    std::cout << std::string{it, s.end()} << '\n';
}

Example 11.5 tests whether two digits are entered. boost::spirit::qi::phrase_parse() only returns true if the two digits are consecutive. Spaces are ignored.

As with the previous examples, boost::spirit::ascii::digit is used to recognize digits. Because boost::spirit::ascii::digit tests exactly one character, the parser is used twice to test the input for two digits. To use boost::spirit::ascii::digit twice in a row, an operator has to be used. Boost.Spirit overloads operator>> for parsers. With ascii::digit >> ascii::digit a parser is created that tests whether a string contains two digits.

If you run the example and enter two digits, true is displayed. If you enter only one digit, the example displays false.

Please note that the example also displays true if you enter a space between two digits. Wherever the operator operator>> is used in a parser, characters are allowed which are ignored by a skipper. Because Example 11.5 uses boost::spirit::ascii::space as the skipper, you may enter as many spaces as you like between the two digits.

If you want the parser to accept two digits only if they follow each other with no space in between, use boost::spirit::qi::parse() or the directive boost::spirit::qi::lexeme.

Example 11.6. Parsing character by character with boost::spirit::qi::lexeme
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <iostream>

using namespace boost::spirit;

int main()
{
  std::string s;
  std::getline(std::cin, s);
  auto it = s.begin();
  bool match = qi::phrase_parse(it, s.end(),
    qi::lexeme[ascii::digit >> ascii::digit], ascii::space);
  std::cout << std::boolalpha << match << '\n';
  if (it != s.end())
    std::cout << std::string{it, s.end()} << '\n';
}

Example 11.6 uses the parser qi::lexeme[ascii::digit >> ascii::digit]. Now, boost::spirit::qi::phrase_parse() only returns true if the digits have no spaces between them.

boost::spirit::qi::lexeme is one of several directives that can change the behavior of parsers. You use boost::spirit::qi::lexeme if you want to disallow characters that would be ignored by a skipper when operator>> is used.

Example 11.7. Boost.Spirit rules similar to regular expressions
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <iostream>

using namespace boost::spirit;

int main()
{
  std::string s;
  std::getline(std::cin, s);
  auto it = s.begin();
  bool match = qi::phrase_parse(it, s.end(), +ascii::digit, ascii::space);
  std::cout << std::boolalpha << match << '\n';
  if (it != s.end())
    std::cout << std::string{it, s.end()} << '\n';
}

Example 11.7 defines a parser with +ascii::digit, which expects at least one digit. This syntax, in particular the plus sign (+), is similar to that used in regular expressions. The plus sign identifies a character or character group which is expected to occur in a string at least once. If you start the example and enter at least one digit, true is displayed. It doesn’t matter whether digits are delimited by spaces. If the parser should accept only digits without spaces, use boost::spirit::qi::lexeme again.

Example 11.8. Numeric parsers
#include <boost/spirit/include/qi.hpp>
#include <string>
#include <iostream>

using namespace boost::spirit;

int main()
{
  std::string s;
  std::getline(std::cin, s);
  auto it = s.begin();
  bool match = qi::phrase_parse(it, s.end(), qi::int_, ascii::space);
  std::cout << std::boolalpha << match << '\n';
  if (it != s.end())
    std::cout << std::string{it, s.end()} << '\n';
}

Example 11.8 expects an integer. boost::spirit::qi::int_ is a numeric parser that can recognize positive and negative integers. Unlike boost::spirit::ascii::digit, boost::spirit::qi::int_ can recognize several characters, such as +1 or -23, as integers.

Boost.Spirit provides additional logical parsers. boost::spirit::qi::float_, boost::spirit::qi::double_, and boost::spirit::qi::bool_ are numeric parsers that can read floating point numbers and boolean values. With boost::spirit::qi::eol, you can test for an end-of-line character. boost::spirit::qi::byte_ and boost::spirit::qi::word can be used to read one or two bytes. boost::spirit::qi::word and other binary parsers recognize the endianness of a platform and parse accordingly. If you want to parse based on a specific endianness, regardless of the platform, you can use parsers like boost::spirit::qi::little_word and boost::spirit::qi::big_word.