Domain Specific Language Research
|Domain Specific Language Research
|Domain Specific Language Research
|Has start date
|Has deadline date
|Has project status
|Kauffman Incubator Project
|Has project output
|Copyright © 2019 edegan.com. All Rights Reserved.
The objective of this research was to determine if and how to implement a Domain-Specific Language for the Listing Page extractor component of the project.
E:\projects\Kauffman Incubator Project\01 Classify entrepreneurship ecosystem organizations\RajanLasya_DSLResearch_03.15
In contrast to General Programming Languages (GPLs), Domain Specific Languages (DSLs) are created to optimize solving problems within a specific domain. While GPLs provide broad functionality, some domains contain a unique architecture that can better modelled by unique abstractions and notations. In addition, the target solution in the domain might not require the full processing power and overhead of a Turing complete GPL. When presented with such a domain, and such a target solution, a DSL can be a powerful tool.
The specificity of a DSL provides several key advantages. Namely, domain-specific constructs can be emulated within the language, increasing efficiency of runtime and accuracy of output. Efficiency can be increased by creating notation that reduces redundancy for repetitive functions within the domain. Specialized compilers and error-checkers can be programmed to enforce domain constraints, improving accuracy of output. Beyond their performance, a subset of DSLs called application domain DSLs can be useful for facilitating program interaction with non-programmers. For example, the software testing DSL Gherkin, written in Ruby, takes natural language syntax and implements it as a software test. Through their ability to create unique idiomatic expressions, DSLs can allow domain experts to interact with data and processing through domain-specific functions and notation.
DSL development also presents a number of disadvantages. Because DSLs require domain expertise and programming expertise, they are difficult to create effectively, and manage long-term over a large user base. DSLs can also often add no new functionality to a GPL, or offer no additional efficiency. These DSLs tend to be scripts that simply “hide” the usage of libraries. However, if a target solution is specific enough, and the abstraction of the domain into the DSL is simple enough, then a true DSL can be built efficiently.
As suggested by the concept diagram, a DSL could be used to express output of an HTML parser that simplifies web page into a tree structure. (This is in the “Information Detector” cloud of the current version of the diagram, as of 3/15/19.) This is an opportunity for a concise mark-up based DSL. The possible steps in creating this DSL would be:
- Determine a host language. The language’s abstractions should be similar enough to the domain abstractions so that the domain can be concisely implemented. Though I’m unfamiliar with many languages at a level this specific, I would recommend Python, for the relatively simplicity of this DSL.
- Write a concrete syntax. This should include all the features the language supports. For this, we would likely borrow and simplify HTML syntax. The “stack,” “row,” and “footer” elements included on the example could represent categories of DOM elements, depending on how detailed we want this abstraction to be.
- Write the grammar. Many parsing libraries support expressing grammar in ENBF (Extended Backus-Naur form), so defining all the grammar in this format would be efficient.
- Run a parsing library on the grammar expressed in ENBF.
- The output of the specific parsing library will determine the next steps. In the example I am looking at, the parsing library used generates a simple parse tree that is then interpreted by a simple Python function. However, if more complex compiler is necessary, then this would be the point at which to write the compiler to turn the parse tree into efficient byte code.
- With the appropriate linking statements, this simply formulated DSL should run as a call from a Python program. There will likely be two Python files, and one file written in our DSL. The first Python file will contain the implementation for our DSL; the second Python file will be the Python module that the DSL will call and execute indirectly using the DSL; the third file will be the DSL source file written by users.
Time and Feasibility of Development
The time required for this development would vary depending on the complexity of the DSL language structure. For the example in “Project Goal v2,” I would assume a rough estimate of 25-30 hours for one person to develop it with no prior knowledge of developing a DSL, allotting 5-10 hours for debugging/ thorough unit testing. Whether a compiler/interpreter would need be to written would also be a significant variable in the total time necessary to develop the DSL. However, from my preliminary research, I believe the attributes of the above DSL are a good fit to express the output of the proposed HTML parser, and developing such a DSL would be a manageable and achievable goal.