Friday, March 26, 2021

Python Study notes: How to run Scala and Spark in the Jupyter notebook

Here we provide some tutorial of running scala in jupyter notebook. It can also be used for scala development with the spylon-kernel. This is an additional kernel that has to be installed separately.
## Prerequisites
* Apache Spark 2.1.1 compiled for Scala 2.11
* Jupyter Notebook
* Python 3.5+

Step1: install the package using `pip` or `conda`

```bash
pip install spylon-kernel
# or
conda install -c conda-forge spylon-kernel
```

Step2: create a kernel spec
This will allow us to select the scala kernel in the notebook.
python -m spylon_kernel install

Step3: start the jupyter notebook
ipython notebook

Step4:  in the notebook we select 
New -> spylon-kernel 
#This will start our scala kernel.

Step5: testing the notebook
val x = 2
val y = 3
x+y

Test: use python:
%%python
x=2
print(x)

Test: we can even use spark to create a dataset:
val data = Seq((1,2,3), (4,5,6), (6,7,8), (9,19,10))
val ds = spark.createDataset(data)
ds.show()

No comments:

Post a Comment

NLP study notes:

word embeddding: collective term of models that learned to map a set of words or phrases in a vocabulary to vectors of numrical values. Ne...