Safe Haskell | Safe |
---|---|
Language | Haskell98 |
Bio.Sequence.SFF_filters
Description
This implements a number of filters used in the Titanium pipeline, based on published documentation.
- type DiscardFilter = ReadBlock -> Bool
- discard_empty :: DiscardFilter
- discard_key :: String -> DiscardFilter
- discard_dots :: Double -> DiscardFilter
- discard_mixed :: DiscardFilter
- discard_length :: Int -> DiscardFilter
- type TrimFilter = ReadBlock -> ReadBlock
- trim_sigint :: TrimFilter
- sigint :: ReadBlock -> Int
- trim_primer :: String -> TrimFilter
- find_primer :: String -> ReadBlock -> Int
- trim_qual20 :: Int -> TrimFilter
- qual20 :: Int -> ReadBlock -> Int
- dlength :: [a] -> Double
- avg :: Integral a => [a] -> Double
- clipFlows :: ReadBlock -> Int -> ReadBlock
- clipSeq :: ReadBlock -> Int -> ReadBlock
- flx_linker :: String
- ti_linker :: String
- rna_adapter :: String
- rna_adapter2 :: String
- rna_adapter3 :: String
- rapid_adapter :: String
- ti_adapter_b :: String
Discarding filters
type DiscardFilter = ReadBlock -> Bool
DiscardFilters determine whether a read is to be retained or discarded
discard_empty :: DiscardFilter
This filter discards empty sequences.
discard_key :: String -> DiscardFilter
Discard sequences that don't have the given key tag (typically TCAG) at the start of the read.
discard_dots :: Double -> DiscardFilter
- 2.2.1.2 The "dots" filter discards sequences where the last positive flow is before flow 84, and flows with >5% dots (i.e. three successive noise values) before the last postitive flow. The percentage can be given as a parameter.
discard_mixed :: DiscardFilter
- 2.2.1.3 The "mixed" filter discards sequences with more than 70% positive flows. Also, discard with noise,20% middle (0.45..0.75) or <30% positive.
discard_length :: Int -> DiscardFilter
Discard a read if the number of untrimmed flows is less than n (n=186 for Titanium)
Trimming filters
type TrimFilter = ReadBlock -> ReadBlock
TrimFilters modify the read, typically trimming it for quality
- 2.2.1.4 Signal intensity trim - trim back until <3% borderline flows (0.5..0.7). Then trim borderline values or dots from the end (use a window).
trim_primer :: String -> TrimFilter
- 2.2.1.5 Primer filter This looks for the B-adaptor at the end of the read. The 454 implementation isn't very effective at finding mutated adaptors.
find_primer :: String -> ReadBlock -> Int
trim_qual20 :: Int -> TrimFilter
- 2.2.1.7 Quality score trimming trims using a 10-base window until a Q20 average is found.
Utility functions
clipFlows :: ReadBlock -> Int -> ReadBlock
Translate a number of flows to position in sequence, and update clipping data accordingly
Data
flx_linker :: String