Hypothesis

29 Matching Annotations

Dec 2024
book.originit.top book.originit.top

零基础入门Spark_大数据_数据_Spark_Spark入门_数据分析-极客时间

1
1. xuchen.xia 08 Dec 2024
  
  in Public
  
  coalesce 会降低同一个 stage 计算的并行度，导致 cpu 利用率不高，任务执行时间变长。我们目前有一个实现是需要将最终的结果写成单个 avro 文件，前面的转换过程可能是各种各样的，我们在最后阶段加上 repartition(1).write().format('avro').mode('overwrite').save('path')。最近发现有时前面的转换过程中有排序时，使用 repartition(1) 有时写得单文件顺序不对，使用 coalesce(1) 顺序是对的，但 coalesce(1) 有性能问题。目前想到可以 collect 到 d
  
  https://stackoverflow.com/questions/31610971/spark-repartition-vs-coalesce
  
  Spark性能
Visit annotations in context

Tags

Spark性能

Annotators

xuchen.xia

URL

book.originit.top/single
Dec 2022
www.zhihu.com www.zhihu.com

Apache Flink和Apache Spark发展前景分别怎样？ - 知乎

1
1. caocao485 14 Dec 2022
  
  in Public
  
  Apache Flink和Apache Spark发展前景分别怎样？
  
  Spark 软件工程大数据
Visit annotations in context

Tags

Spark

大数据

软件工程

Annotators

caocao485

URL

zhihu.com/question/30151872
www.zhihu.com www.zhihu.com

Spark 千万级用户相似度计算？ - 知乎

1
1. caocao485 13 Dec 2022
  
  in Public
  
  Spark 千万级用户相似度计算？
  
  spark 机器学习大数据
Visit annotations in context

Tags

spark

大数据

机器学习

Annotators

caocao485

URL

zhihu.com/question/265901363
Sep 2022
www.smithsonianmag.com www.smithsonianmag.com

Can Climate Fiction Writers Reach People in Ways That Scientists Can't?

1
1. sidnt 09 Sep 2022
  
  in Public
  
  When it comes to understanding our planet and its future, can novelists reach people in ways that scientists cannot?
  
  .
  
  debate-spark
Visit annotations in context

Tags

debate-spark

Annotators

sidnt

URL

smithsonianmag.com/innovation/can-climate-fiction-writers-reach-people-ways-scientists-cant-180977714/
Jun 2022
community.databricks.com community.databricks.com

How can I split a Spark Dataframe into n equal Dataframes (by rows)? I tried to add a Row ID column to acheive this but was unsuccessful.

1
1. wenijinew 02 Jun 2022
  
  in Public
  
  You can use the DataFrame's randomSplit function
  
  split dataframe
  
  spark dataframe
Visit annotations in context

Tags

spark

dataframe

Annotators

wenijinew

URL

community.databricks.com/s/question/0D53f00001HKHfHCAX/how-can-i-split-a-spark-dataframe-into-n-equal-dataframes-by-rows-i-tried-to-add-a-row-id-column-to-acheive-this-but-was-unsuccessful
May 2022
spark.apache.org spark.apache.org

Configuration - Spark 2.2.0 Documentation

1
1. wenijinew 30 May 2022
  
  in Public
  
  Spark properties mainly can be divided into two kinds: one is related to deploy, like “spark.driver.memory”, “spark.executor.instances”, this kind of properties may not be affected when setting programmatically through SparkConf in runtime, or the behavior is depending on which cluster manager and deploy mode you choose, so it would be suggested to set through configuration file or spark-submit command line options; another is mainly related to Spark runtime control, like “spark.task.maxFailures”, this kind of properties can be set in either way.
  
  spark properties
  
  spark properties
Visit annotations in context

Tags

spark

properties

Annotators

wenijinew

URL

spark.apache.org/docs/latest/configuration.html
Local file Local file

Building a Second Brain: A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential

1
1. chrisaldrich 24 May 2022
  
  in Public
  
  keep only what resonates in a trusted place thatyou control, and to leave the rest aside
  
  Though it may lead down the road to the collector's fallacy, one should note down, annotate, or highlight the things that resonate with them. Similar to Marie Kondo's concept in home organization, one ought to find ideas that "spark joy" or move them internally. These have a reasonable ability to be reused or turned into something with a bit of coaxing and work. Collect now to be able to filter later.
  
  spark joy Marie Kondo note taking
Tags

note taking

spark joy

Marie Kondo

Annotators

chrisaldrich
community.cloudera.com community.cloudera.com

Spark - Cannot mkdir file

1
1. wenijinew 12 May 2022
  
  in Public
  
  job.local.dir
  
  to save data by spark
  
  spark
Visit annotations in context

Tags

spark

Annotators

wenijinew

URL

community.cloudera.com/t5/Support-Questions/Spark-Cannot-mkdir-file/m-p/67896
Apr 2022
www2.le.ac.uk www2.le.ac.uk

What is an e-tivity? — University of Leicester

1
1. Joana.Marinho 12 Apr 2022
  
  in Public
  
  E-tivities generally involve the tutor providing a small piece of information, stimulus or challenge, which Salmon refers to as the 'spark'.
  
  Efetivamente estas e-atividades são mesmo isso, um estimulo importante neste novo ensino. Os alunos precisam de se sentir parte da "sala de aula" e de se sentirem motivados à aprendizagem.
  
  "spark"
Visit annotations in context

Tags

"spark"

Annotators

Joana.Marinho

URL

www2.le.ac.uk/departments/geography/projects/tri-orm/archived-advanced-orm/advancedorm/course-materials/etivity
Jan 2022
www.apartmenttherapy.com www.apartmenttherapy.com

“Tokimeku” Means So Much More Than “Spark Joy” in Japanese

1
1. chrisaldrich 20 Jan 2022
  
  in Public
  
  https://www.apartmenttherapy.com/marie-kondo-tokimeku-spark-joy-translation-266496
  
  on the translation of tokimeku, or ときめく, as "spark joy"
  
  translations language spark joy ときめく read Japanese
Visit annotations in context

Tags

ときめく

language

Japanese

translations

spark joy

read

Annotators

chrisaldrich

URL

apartmenttherapy.com/marie-kondo-tokimeku-spark-joy-translation-266496
May 2021
spark.apache.org spark.apache.org

FAQ | Apache Spark

1
1. akshayraj.kore 29 May 2021
  
  in Public
  
  local[N]
  
  Spark
Visit annotations in context

Tags

Spark

Annotators

akshayraj.kore

URL

spark.apache.org/faq.html
Apr 2021
datamechanics.co datamechanics.co

Apache Spark 3.1 Release: Spark on Kubernetes is now Generally Available

1
1. pyxelr 01 Apr 2021
  
  in Public
  
  With Spark 3.1, the Spark-on-Kubernetes project is now considered Generally Available and Production-Ready.
  
  With Spark 3.1 k8s becomes the right option to replace YARN
  
  Spark Kubernetes MLOps YARN
Visit annotations in context

Tags

Spark

YARN

MLOps

Kubernetes

Annotators

pyxelr

URL

datamechanics.co/blog-post/apache-spark-3-1-release-spark-on-kubernetes-is-now-ga
Feb 2021
itnext.io itnext.io

Machine Learning Model Serving Options

1
1. pyxelr 26 Feb 2021
  
  in Public
  
  Consider the amount of data and the speed of the data, if low latency is your priority use Akka Streams, if you have huge amounts of data use Spark, Flink or GCP DataFlow.
  
  For low latency = Akka Streams
  
  For huge amounts of data = Spark, Flink or GCP DataFlow
  
  MLOps Spark Akka Flink GCP
Visit annotations in context

Tags

MLOps

GCP

Flink

Spark

Akka

Annotators

pyxelr

URL

itnext.io/machine-learning-model-serving-options-1edf790d917
blog.doist.com blog.doist.com

7 Ways to Support Your Team During the Pandemic — And Any Crisis

1
1. doitian 21 Feb 2021
  
  in Public
  
  Drink a cup of caffeinated beverage beforehand
  
  spark
Visit annotations in context

Tags

spark

Annotators

doitian

URL

blog.doist.com/imposter-syndrome-new-managers/
Jun 2020
ceur-ws.org ceur-ws.org

paper43.pdf

1
1. SamRose 11 Jun 2020
  
  in Public
  
  food21 spark kafka sparql rdf
Visit annotations in context

Tags

kafka

rdf

spark

food21

sparql

Annotators

SamRose

URL

ceur-ws.org/Vol-1690/paper43.pdf
Sep 2019
jaceklaskowski.gitbooks.io jaceklaskowski.gitbooks.io

Spark on Mesos · The Internals of Apache Spark

1
1. SamRose 15 Sep 2019
  
  in Public
  
  spark mesos
Visit annotations in context

Tags

spark

mesos

Annotators

SamRose

URL

jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-mesos/spark-mesos.html
blog.minio.io blog.minio.io

Modern Data Lake with Minio : Part 2

1
1. SamRose 02 Sep 2019
  
  in Public
  
  hive spark presto minio
Visit annotations in context

Tags

spark

presto

minio

hive

Annotators

SamRose

URL

blog.minio.io/modern-data-lake-with-minio-part-2-f24fb5f82424
medium.com medium.com

Setup a 3-node Hadoop-Spark-Hive cluster from scratch using Docker

1
1. SamRose 01 Sep 2019
  
  in Public
  
  hadoop hive spark apache spark sparkml
Visit annotations in context

Tags

apache spark

spark

hadoop

hive

sparkml

Annotators

SamRose

URL

medium.com/@aditya.pal/setup-a-3-node-hadoop-spark-hive-cluster-from-scratch-using-docker-332dae6b98d0
Mar 2019
testdriven.io testdriven.io

Deploying Spark on Kubernetes

1
1. SamRose 29 Mar 2019
  
  in Public
  
  apache spark sparkml kubernetes
Visit annotations in context

Tags

apache spark

kubernetes

sparkml

Annotators

SamRose

URL

testdriven.io/blog/deploying-spark-on-kubernetes/
Dec 2018
docs.aws.amazon.com docs.aws.amazon.com

Apache Livy - Amazon EMR

1
1. SamRose 31 Dec 2018
  
  in Public
  
  aws emr apache livy apache spark
Visit annotations in context

Tags

apache spark

apache livy

aws emr

Annotators

SamRose

URL

docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-livy.html
aws.amazon.com aws.amazon.com

Build a Concurrent Data Orchestration Pipeline Using Amazon EMR and Apache Livy | Amazon Web Services

1
1. SamRose 31 Dec 2018
  
  in Public
  
  apache livy spark aws emr airflow
Visit annotations in context

Tags

spark

apache livy

airflow

aws emr

Annotators

SamRose

URL

aws.amazon.com/blogs/big-data/build-a-concurrent-data-orchestration-pipeline-using-amazon-emr-and-apache-livy/
Nov 2018
elearningindustry.com elearningindustry.com

Top 10 Tools For The Digital Classroom - eLearning Industry

1
1. beachboundhawaii 02 Nov 2018
  
  in Public
  
  Top 10 Tools For The Digital Classroom
  
  This article presents a variety of new tools and apps that will enhance the digital classroom experience. Some of the new tools mentioned are Socrative, Scratch, Prezi, Google classroom and more!
  
  Excellent list to get your digital room started!
  
  RATING: 5/5 (rating based upon a score system 1 to 5, 1= lowest 5=highest in terms of content, veracity, easiness of use etc.)
  
  socartive etc556 etcnau digital learning digital classroom scratch app prezi google classroom selfcad quizlet adobe spark video khan academy class dojo seesaw portfolio application education technology tools free tools for teachers virtual classroom educational technology tools
Visit annotations in context

Tags

digital classroom

socartive

seesaw portfolio application

adobe spark video

free tools for teachers

education technology tools

etcnau

class dojo

google classroom

virtual classroom

digital learning

educational technology tools

quizlet

selfcad

etc556

scratch app

khan academy

prezi

Annotators

beachboundhawaii

URL

elearningindustry.com/tools-for-the-digital-classroom-top-10
Oct 2018
developer.ibm.com developer.ibm.com

Build a recommender with Apache Spark and Elasticsearch - IBM Developer

1
1. SamRose 10 Oct 2018
  
  in Public
  
  elasticsearch apache spark
Visit annotations in context

Tags

elasticsearch

apache spark

Annotators

SamRose

URL

developer.ibm.com/patterns/build-a-recommender-with-apache-spark-and-elasticsearch/
Jan 2018
datastrophic.io datastrophic.io

Spark JobServer: from Spark Standalone to Mesos, Marathon and Docker. Part I

1
1. SamRose 11 Jan 2018
  
  in Public
  
  mesos spark
Visit annotations in context

Tags

spark

mesos

Annotators

SamRose

URL

datastrophic.io/spark-jobserver-from-spark-standalone-to-mesos-marathon-and-docker-part-i/
medium.com medium.com

How to run a Spark cluster on Mesos on AWS – Arunkumar Eli – Medium

1
1. SamRose 11 Jan 2018
  
  in Public
  
  mesos spark aws
Visit annotations in context

Tags

spark

mesos

aws

Annotators

SamRose

URL

medium.com/@elrarun/how-to-run-a-spark-cluster-on-mesos-on-aws-293e54fa7ee6
May 2017
blog.thehumangeo.com blog.thehumangeo.com

Using Amazon Elastic Map Reduce (EMR) with Spark and Python 3.4

1
1. SamRose 29 May 2017
  
  in Public
  
  aws emr spark pyspark
Visit annotations in context

Tags

spark

pyspark

emr

aws

Annotators

SamRose

URL

blog.thehumangeo.com/amazon-emr-spark-python3.html
github.com github.com

awslabs/emr-bootstrap-actions

1
1. SamRose 29 May 2017
  
  in Public
  
  spark pyspark emr aws
Visit annotations in context

Tags

spark

pyspark

emr

aws

Annotators

SamRose

URL

github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/spark-submit-via-step.md
Apr 2017
www.dezyre.com www.dezyre.com

Step-by-Step Apache Spark Installation Tutorial

1
1. SamRose 10 Apr 2017
  
  in Public
  
  spark ubuntu
Visit annotations in context

Tags

spark

ubuntu

Annotators

SamRose

URL

dezyre.com/apache-spark-tutorial/apache-spark-installation-tutorial
Apr 2014
www.dbms2.com www.dbms2.com

Untitled document

1
1. aculich 30 Apr 2014
  
  in Public
  
  Mike Olson of Cloudera is on record as predicting that Spark will be the replacement for Hadoop MapReduce. Just about everybody seems to agree, except perhaps for Hortonworks folks betting on the more limited and less mature Tez. Spark’s biggest technical advantages as a general data processing engine are probably: The Directed Acyclic Graph processing model. (Any serious MapReduce-replacement contender will probably echo that aspect.) A rich set of programming primitives in connection with that model. Support also for highly-iterative processing, of the kind found in machine learning. Flexible in-memory data structures, namely the RDDs (Resilient Distributed Datasets). A clever approach to fault-tolerance.
  
  Spark's advantages:
  
  DAG processing model
  
  programming primitives for DAG model
  
  highly-iterative processing suited for ML
  
  RDD in-memory data structures
  
  clever approach to fault-tolerance
  
  brc data spark
Visit annotations in context

Tags

brc

data

spark

Annotators

aculich

URL

dbms2.com/2014/04/30/spark-on-fire/