Summary: If you want to play around with spark without messing with a hadoop installation/cluster etc, download a spark package which is built with a version of hadoop - so pick packages like "Pre-built for Hadoop 2.6 and later", not the one which says "Pre-built with user provided Hadoop...".
Details
I downloaded Apache Spark from here, unzipped and tried to run it like so:
.\bin\spark-shell --master local
And here's what I got (edited exception):
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
...
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
...
Googling gave me suggestions like installing hadoop - and I was pretty sure that I should be able to run spark without the whole hadoop infrastructure. Finally realised that I had downloaded "spark without hadoop" and that works only if you point it to a hadoop installation that you already have. The reason is that though spark doesn't need a hadoop cluster to work, it does need some hadoop libraries (which is what the ClassLoader is complaining about in the stack trace above).
So I downloaded a spark distribution with hadoop and voila!
Details
I downloaded Apache Spark from here, unzipped and tried to run it like so:
.\bin\spark-shell --master local
And here's what I got (edited exception):
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
...
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
...
Googling gave me suggestions like installing hadoop - and I was pretty sure that I should be able to run spark without the whole hadoop infrastructure. Finally realised that I had downloaded "spark without hadoop" and that works only if you point it to a hadoop installation that you already have. The reason is that though spark doesn't need a hadoop cluster to work, it does need some hadoop libraries (which is what the ClassLoader is complaining about in the stack trace above).
So I downloaded a spark distribution with hadoop and voila!
This comment has been removed by a blog administrator.
ReplyDeleteNice post. Thanks for sharing the valuable information. it’s really helpful. Who want to learn this blog most helpful. Keep sharing on updated posts…
ReplyDeleteData science training in tambaram | Data Science training in anna nagar
Data Science training in chennai | Data science training in Bangalore
Data Science training in marathahalli | Data Science training in btm
I really like your blog. You make it interesting to read and entertaining at the same time. I cant wait to read more from you.
ReplyDeleteangularjs Training in chennai
angularjs Training in chennai
angularjs-Training in tambaram
angularjs-Training in sholinganallur
angularjs-Training in velachery
Appreciate you sharing, great article.Much thanks again. Really Cool.
ReplyDeletebest online java corse
java online classes