

However, the features of smORFs and translated ORFs of protein-coding genes differ substantially. Meanwhile, traditional translated ORF prediction mainly relies on the ORF size, sequence evolutionary conservation, and mass spectrometry (MS) data. However, evaluating the protein-coding potential of smORFs remains challenging for conventional prediction methods. Precise identification of translated smORFs derived from lncRNAs is prerequisite of their functional studies ( Kong et al., 2007 Olexiouk et al., 2016 Xiao et al., 2018). Owing to their critical functions, it is necessary to systematically identify translated smORFs derived from lncRNAs and explore their potential physiological and pathological functions to comprehensively elucidate the building blocks of living systems.

For example, the lncRNA HOXB-AS3 encodes a 53 amino acid micropeptide that affects clone cell metabolism to suppress cancer progression by competitively binding with the RNA binding protein hnRNP A1 to inhibit the splicing of pyruvate kinase ( Huang et al., 2017). More importantly, micropeptides encoded by lncRNAs have been demonstrated to have essential roles in tumorigenesis. Moreover, in human, the lncRNA-encoded micropeptide myoregulin (MLN) is an important regulator of skeletal muscle performance that directly inhibits the sarco/endoplasmic reticulum calcium-ATPase to control muscle relaxation by regulating calcium ion uptake into the sarcoplasmic reticulum ( Anderson et al., 2015). Meanwhile, in zebrafish, a micropeptide called Toddler can activate the extracellular-signal-regulated kinase pathway to promote embryogenesis ( Pauli et al., 2014). For example, in Drosophila, the lncRNA pncr003:2L encodes two micropeptides that regulate cardiac contraction ( Magny et al., 2013). Recent genome-wide studies have revealed that small open reading frames (smORFs) concealed in long non-coding RNAs (lncRNAs) could encode micropeptides (≤100 amino acids) with essential roles in the regulation of physiological and pathological processes of various species ( Guttman et al., 2013 Magny et al., 2013 Bazzini et al., 2014 Pauli et al., 2014 Anderson et al., 2015 Calviello et al., 2016). Despite intensive investigations and therapeutic improvements, the 5-year overall survival rate for HCC is merely 18% ( Siegel et al., 2019), highlighting the urgent need to clarify novel mechanisms contributing to liver malignancy. Moreover, it is the third leading cause of cancer death. Hepatocellular carcinoma (HCC) accounts for more than 90% of primary liver cancers and is the sixth most common malignancy worldwide. In this study, we systematically identified translated smORFs derived from lncRNAs and explored their potential pathological functions in cancer to improve our comprehensive understanding of the building blocks of living systems Functional studies revealed that ZFAS1 can promote cancer cell migration by elevating intracellular reactive oxygen species production by inhibiting nicotinamide adenine dinucleotide dehydrogenase expression, indicating that translated ZFAS1 may be an essential oncogene in the progression of HCC.

After analyzing 11 lncRNA expression profiles of seven cancer types, we identified one validated translated lncRNA, ZFAS1, which was significantly up-regulated in hepatocellular carcinoma (HCC). In total, 537 putative translated smORFs were identified and the coding potential of five smORFs was experimentally validated via green fluorescent protein-tagged protein generation and mass spectrometry. Therefore, we created classifiers to identify translated smORFs derived from lncRNAs based on ribosome-protected fragment sequencing and machine learning methods. Since translated smORF identification remains technically challenging, little is known of their pathological functions in cancer.

Recently, lncRNA-encoded micropeptides have been shown to have essential roles in tumorigenesis. Micropeptides (≤100 amino acids) are essential regulators of physiological and pathological processes, which can be encoded by small open reading frames (smORFs) derived from long non-coding RNAs (lncRNAs).
