Concept Location

Concept location is a part of Incremental Change process and it allows the programmers to determine an initial location of a change within the source code. Typically, concepts appear as nouns, verbs, or short clauses in the change request. These concepts are also embedded within the structures of the source code and appear as variables, classes, or methods. Concept location is the process that finds the implementations of these concepts [1]. Effective concept location techniques are crucial for software engineers since they provide the means for evolving large software systems without understanding the entire body of the code [2] and identify the place in the software where the change is to be made.

We distinguish two types of concepts: explicit and implicit. Explicit concepts are directly implemented in the code as variables, snippets of executable code, methods, classes, and so forth. Implicit concepts are assumptions that underlie parts of a code but not but are not directly implemented. For example, many applications assume there is only one user who is working with them and there is no specific code that can be identified as an implementation of this single user concept. If in a process of maintenance such an application is required to support multiple users, programmers would have to change the implicit concept of the user to the explicit one and this change requires a substantial effort [3]. Depending on the type of concepts, different concept location techniques may be applied.

While there are several different methodologies available for concept location, for example pattern-matching search that uses grep tool, static dependency search, information retreival (IR) etc. Currently, JRipples tool supports the dependency search technique [2], which is based on the traversal of the program dependencies in a manner that is similar to a depth-first search, but the search is conducted by the programmers rather than a computer. The static dependency search usually begins at the top class containing the main() or init() function. The programmers follow the static program dependencies making the decisions that guide the search. For example, in Java, class A depends on class B if class A refers to class B as a data member, local variable, argument, or data cast, or if class A inherits from class B, or if class A implements the interface of class B; in this situation, class A is called a dependent class and class B is called a supporting class. If the concept is not implemented in the chosen class, the programmers must determine which of the supporting classes leads to it. If none of the supporting classes lead to the desired concept, then a wrong turn must have been taken, the search backtracks to the previous class, and a different dependency is chosen. This proceeds until the concept is located within the program’s classes. Dependency search is suitable for the location of both implicit and explicit concepts.

Additional techniques are supported with JRipples modules as described in [2], see Lucene and grep.

Components, identified as Located during Concept Location, serve as an input to the Impact Analysis activity of the Incremental Change process.

DepIR

DepIR is a novel concept location technique supported by JRipples and its Lucene plug-in. DepIR combines dependency search with IR in such a way that it mimics the usual behavior of programmers, who interchangeably search and browse information. The IR component of DepIR supports textual searching, whereas the static dependency search supports browsing through the software. The following steps summarize how DepIR assists developers in locating concepts in source code. For convenience, we discuss DepIR on granularity of methods, yet the methodology is the same when using classes instead of methods. Figure depicts DepIR’s methodology, which is described here in detail.

Initial step of DepIR begins with the user formulating an IR query and investigating the results. Once the query is formulated and executed, the IR engine ranks the components of the system and presents them to the developer. The results are ranked in descending order based on their relevance (i.e., textual similarity) to the query. The programmers inspect the returned results and determine their relevance to the target concept. If an inspected component is a part of the target concept, then the concept is located and the developers halt the search procedure. If the component is not related to the target concept, then the search continues to the method with the next highest rank. Alternatively, while inspecting the ranked list of the results, the developers may encounter a component, which does not implement the concept, but still is related to the target concept. In this case, the programmers begin to explore the dependency graph using the dependency search technique, having the inspected component as the entry point of the search. Also, while studying the code of a focus component, the programmers might encounter new terms or facts about the target concept; in this case, the programmers reformulate the query and re-rank the system.

The developers explore the dependency graph by investigating components in the neighborhood of the chosen starting point. At each step, the programmers investigate the current focus component; if they determine that the focus component does not implement the target concept, the search continues to one of the adjacent components or backtracks to the previous focus component. While choosing among the adjacent components, the programmers follow the ranking computed for these components based on the current query. If the programmers decide to re-formulate the query and update the ranking of the system while inspecting one of the focus methods, the ranking and the order of inspection of the adjacent nodes will be updated as well. Additionally, at any moment during browsing the dependency graph, the programmers can decide to terminate the dependency search and return to the ranked list of all components in the system. DepIR keeps track of the entire search and navigation path.

The process of reformulating the queries and exploring the dependency graph continues until the target concept is located. Note that once the programmers examine a component during the IR or the dependency search part of DepIR and decide that this method does not implement the target concept, the component is removed from the search and ignored until the concept is located. This ensures that the same method is not inspected twice, and helps to avoid cycles in dependency graph.

References

[1] Rajlich, V. and Wilde, N., "The Role of Concepts in Program Comprehension", in Proceedings of IEEE International Workshop on Program Comprehension (IWPC'02), 2002, pp. 271-278.
[2] Marcus, A., Rajlich, V., Buchta, J., Petrenko, M., and Sergeyev, A., "Static Techniques for Concept Location in Object-Oriented Code", in Proceedings of 13th IEEE International Workshop on Program Comprehension (IWPC'05), 2005, pp. 33-42.
[3] Rajlich, V. and Gosavi, P., "Incremental Change in Object-Oriented Programming", in IEEE Software, 2004, pp. 2-9.