Scala combinator parser



This is my attempt to get you started with scala combinator parser. It is a very powerful tool that allows you to create your own lexical analysis similar to yacc and other lexer. Area of application includes

a) calculator that accept input string as follow "3+2"

b) match specific text in a given string like finding "fox" in the string "the brown fox jump over the fence"

I'm going to use an example from reference below. Here we are trying to extend our SimpleParser from RegexParser and defined a function called "word" that is basically a regex itself with the purpose of matching a single word.

Simple Parser

This code is taken from https://wiki.scala-lang.org/

import scala.util.parsing.combinator._

class SimpleParser extends RegexParsers {
  def word: Parser[String]    = """[a-z]+""".r ^^ { _.toString }
}

In code above, """ means it is a regex.  

^^ mean if  a match is found, you execute the following code which is { _.toString }

object TestSimpleParser extends SimpleParser {
  def main(args: Array[String]) = println(parse(word, "johnny come lately"))
}

I will call parse method and give it a parsing point called "word".  parse method has the following method signature.

  def parse[T](p: Parser[T], in: Reader[Char]): ParseResult[T] = p(in)

But not important now.

As a result of this execution you get "johnny" after you pass in "johnny come lately",


Word Count Parser

Next we going to look at another example. This time, we will count how many words we have in a string.

class SimpleParser extends RegexParsers {
  def count(a : List[String]):Int = {
    return a.length
  }
  def word: Parser[String]   = """[a-z]+""".r   ^^ { _.toString }
  def freq: Parser[Int] = word.* ^^ { count(_) }
}



object TestSimpleParser extends SimpleParser {
  def main(args: Array[String]) = {
    parse(freq, "johnny and the love of the world") match {
      case Success(matched,_) => println(matched)
      case Failure(msg,_) => println("FAILURE: " + msg)
      case Error(msg,_) => println("ERROR: " + msg)
    }
  }
}


I will call parse method and give it a parsing point called "freq".

When you execute it, you should get 7.

Some symbols of interest. 

p1 ~ p2 - // sequencing: must match p1 followed by p2
p1 |  p2 - // alternation: must match either p1 or p2, with preference given to p1
p1.? - optionally may match p1 or not
p1.* - repetition - matches any number of repetition of p1

Where to go from here, try this site for more examples.


References

To get a better understanding of how it works,  please refer to here and here.

Comments

Popular posts from this blog

The specified initialization vector (IV) does not match the block size for this algorithm