sparkSpark’s primary abstraction is a distributed collection of items called a Dataset. Datasets can be created from Hadoop InputFormats (such as HDFS files) or by transforming other Datasets.Useful Links. Website - browse the project homepage; Documentation - read documentation and usage guides; Downloads - latest development builds; What does it do? spark is made up of a