Abstract:
This study aims to quantify the syntactic distance between any two words in a phrase-structure syntactic tree and to elucidate its cross-linguistic features as well as its significance in measuring syntactic complexity. Integrating path length, hierarchical depth, and intervening word count, we propose a tree-path-based computational method for syntactic distance. A quantitative analysis based on treebanks across nine languages yields the following findings: 1) Syntactic distance reflects the cognitive effort associated with language processing and is a metric of cross-linguistic syntactic complexity; 2) The mean syntactic distances of human languages are limited to the capacity of working memory and probably fall in the range of 4 to 6; 3) The distribution of syntactic distance abides by the negative binomial distribution and demonstrates a positive correlation with sentence length; 4) These properties of syntactic distance are cross-linguistic. The findings suggest that syntactic distance is a pivotal theoretical construct with a value comparable to that of dependency distance. This study supplements phrase-structure grammar with a universally meaningful quantitative metric, making it possible to apply syntactic quantitative analysis more extensively to such fields as linguistic typology, comparative linguistics and second language teaching.