1 min readJan 22, 2019
Therefore the more layers there are in our Neural Network or the deeper our Neural Network is, the more info is compressed and lost at each layer and this amplifies and causes major data loss overall.
^ due to the above drawback, and ReLU has two major advantages over Sigmoid : sparsity and a reduced likelihood of vanishing gradient. Google these up or join a course like the deeplearning.ai one on Coursera to know more.