目录

Longest Run Subsequence

Implementation of a solver for the Longest Run Subsequence Problem. Given a sequence as input, compute a longest subsequence such that there is at most one run for every character.

Example

A longest run subsequence of the string ccbcbbbdaaddd is cccbbbaaddd.

Algorithms

Depending on the properties of an instance the solver uses one of two algorithms to solve the problem. For long strings with small alphabets a dynamic programming approach is used, while short strings with large alphabets are solved via Integer Linear Programming. Every input instance is processed by reduction rules first to split it into smaller instances, if possible. Details can be found in [1]. Please consider citing this paper if you find the implementation useful for your work.

Installation

The Integer Linear Program algorithm is only available if PuLP is installed on the system. PuLP is a free API for modelling linear programs and available on PyPI or conda.

Usage

To solve Longest Run Subsequence instances, the function lrs has to be imported from the module.

Example code::

from longestrunsubsequence import lrs
print(lrs('ccbcbbbdaaddd'))
> [0, 1, 3, 4, 5, 6, 8, 9, 10, 11, 12]

The output is a list of indices, which represent the elements of the longest subsequence. The input can be a string or a list with arbitrary elements.

References

[1] Schrinner, S., Goel, M., Wulfert, M., Spohr, P., Schneeberger, K., Klau, G.W.: The Longest Run Subsequence Problem. In: Kingsford, C., Pisanti, N. (eds.) 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), vol. 172, pp. 6–1613. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany (2020). doi:10.4230/LIPIcs.WABI.2020.6. https://drops.dagstuhl.de/opus/volltexte/2020/12795

关于

用于计算字符串的最长游程子序列的算法库

86.0 KB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号