### Introduction

### Methods

### Gene sequences

### Mathematical analysis

*H(XY)*is the entropy of the product of the random variables X and Y.

*0 ≤ C(X;Y) ≤ 1*.

*C(X;Y)*=

*0*if and only if the random variables X and Y are independent (no correlation between the variables).

*C(X;Y)*=

*1*if and only if there is a functional relation (correlation or influence) between X and Y.

*x(n) = (x(1), x(2), …, x(n),…)*represent discrete time series having symbolic values.

*x(n+j) = (x(1 + j), x(2 + j),…, x(n + j),…)*be a time series x(n) with a lagj.

*x(n)*with a lagj equals:

*I(x(n); x(n + j)) = H(x(n)) + H(x (n + j)) – H(x(n), x(n + j))*.

*x(n)*with a lag

*j*equals:

*C(x(n); x(n+j))*is then calculated as a function of the lag

*j*.

*F(j)=C(x(n); x(n + j))*as the information function of the discrete time series

*x(n)*.

*0 ≤ F(j) ≤ 1*.

*F(j)*= 0 if and only if

*x(n)*and

*x(n + j)*are mutually independent.

*F(j)*= 1 if and only if there exists a functional relationship between

*x(n)*and

*x(n + 1)*.

*{x1(n), x2(n), …, xk(n)}*be a set of discrete time series, whose elements are symbols, e.g. gene nucleotide sequences,

*n = 1, 2, 3,…*, and the maximum value

*n*for a sequence

*xi(n) 1 ≤ i ≤ k*equals the number of elements in this nucleotide sequence.

*{x1(n), x2(n), …, xk(n)}*consists of three procedures: (1) construction of an information function matrix; (2) ranking of columns of the information function matrix; and (3) application of a multiple comparisons method.

* Construction of an information function matrix*

*xi (n) 1 ≤ i≤ k*, we construct the information function as follows:

*1 ≤ i ≤ k, 1 ≤ j ≤ m*,

*k x m [Fi(j)]*matrix of values of the information functions, i.e., a matrix where each row is an information function of the corresponding time series.

* Ranking of columns of the information function matrix*

*[Fi(j)]*matrix is an information function of time series, and each column contains the values of information functions corresponding to the same lag.

*[Fi(j)]*matrix, we rank its entries and assign the rank 1 to the smallest entry of the column. We obtain

*k x m*matrix of ranks

*[ri(j)]*, with each column of the matrix containing ranks from 1 to

*k*.

*We estimate the element interconnection of the i-th time series as compared to the element interconnection of other time series by the sum of all the elements of i-th row of the matrix [ri(j)]*. Such an estimation allows us to use multiple comparisons of rank statistics for the comparison of time series interconnection.

### Results

### The values and clustering of gene information functions

*[Fi(j)] 1 ≤ i ≤ 14, 1 ≤ j ≤ 12*(Table 2).

*[ri(j)] 1 ≤ i ≤ 14, 1 ≤ j ≤ 12*(Table 3).

*Hypotheses*:

*Critical range*:

^{2}

_{13}distribution.

^{2}-criterion. This gives us χ

^{2}= 91.65. The critical range is χ

^{2}

_{13}> 27.69. Since 91.65 > 27.69, the null hypothesis with respect to Table 3 is rejected. Thus, according to the Friedman test, the row effect has been found. Hence, there is a difference between the rows under consideration.

*/R*> 8.93, where

_{i}- R_{i+1}/*R*and

_{i}*R*are elements of the column “Sum of ranks” in the

_{i+1}*i*-th and (

*i+1*)-th rows of Table 3, respectively. By multiple comparisons, we construct the clustering shown in Table 4.

*α*= 0.01); (2) Elements belonging to the same set do not differ from each other (

_{T}*α*= 0.01).

_{T}*α*= 0.01). The same holds true for cluster 3 (BECN1 gene), cluster 4 (mTOR gene), and cluster 7 (IGF1 gene).

_{T}### The significance of gene information functions

*Lag j*as the sum of elements of the column

*Lag j*of Table 5. Let us consider Table 5 as the Friedman statistical model, and examine the column effect of this table.

*Hypotheses*:

*Critical range*:

^{2}

_{11}distribution.

^{2}-criterion. This gives us χ2 = 121.5. The critical range is χ

^{2}

_{11}> 24.73. Since 121.5 > 24.73, the null hypothesis with respect to Table 4 is rejected. Thus, according to the Friedman test, the column effect has been found. Hence, there is a difference between the columns under consideration.

*/R*> 9.64, where

_{i}‒ R_{i+1}/*R*and

_{i}*R*are elements of the column “Sum of ranks” in the

_{i+1}*i*-th and (

*I + 1*)-th rows of Table 5, respectively. By multiple comparisons, we construct the clustering shown in Table 6.

*α*= 0.01); (2) Elements belonging to the same set do not differ from each other (

_{T}*α*= 0.01).

_{T}*α*= 0.01). The same holds true for cluster 2 (Lag 2), cluster 3 (Lag 6), and cluster 4 (Lag 3).

_{T}