27 Matching Annotations

Mar 2023
docs.xarray.dev docs.xarray.dev

Indexing and selecting data

1
1. aries1988 24 Mar 2023
  
  in Public
  
  Vectorized indexing may be used to extract information from the nearest grid cells of interest, for example, the nearest climate model grid cells to a collection specified weather station latitudes and longitudes.
  
  moi FRD
Visit annotations in context

Tags

FRD

moi

Annotators

aries1988

URL

docs.xarray.dev/en/stable/user-guide/indexing.html
Nov 2021
www.tensorflow.org www.tensorflow.org

Tutorials | TensorFlow

10
1. aries1988 26 Nov 2021
  
  in Public
  
  If you don't have that information, you can determine which frequencies are important by extracting features with Fast Fourier Transform. To check the assumptions, here is the tf.signal.rfft of the temperature over time. Note the obvious peaks at frequencies near 1/year and 1/day:
  
  Do a fft with tensorflow
  
  fft = tf.signal.rfft(df['T (degC)']) f_per_dataset = np.arange(0, len(fft)) n_samples_h = len(df['T (degC)']) hours_per_year = 24*365.2524 years_per_dataset = n_samples_h/(hours_per_year) f_per_year = f_per_dataset/years_per_dataset plt.step(f_per_year, np.abs(fft)) plt.xscale('log') plt.ylim(0, 400000) plt.xlim([0.1, max(plt.xlim())]) plt.xticks([1, 365.2524], labels=['1/Year', '1/day']) _ = plt.xlabel('Frequency (log scale)')
  
  DF moi howto
2. aries1988 26 Nov 2021
  
  in Public
  
  Now, peek at the distribution of the features. Some features do have long tails, but there are no obvious errors like the -9999 wind velocity value.
  
  indeed, peek. we are looking at test data too.
  
  df_std = (df - train_mean) / train_std df_std = df_std.melt(var_name='Column', value_name='Normalized') plt.figure(figsize=(12, 6)) ax = sns.violinplot(x='Column', y='Normalized', data=df_std) _ = ax.set_xticklabels(df.keys(), rotation=90)
  
  howto
3. aries1988 26 Nov 2021
  
  in Public
  
  It is important to scale features before training a neural network. Normalization is a common way of doing this scaling: subtract the mean and divide by the standard deviation of each feature. The mean and standard deviation should only be computed using the training data so that the models have no access to the values in the validation and test sets. It's also arguable that the model shouldn't have access to future values in the training set when training, and that this normalization should be done using moving averages.
  
  moving average to avoid data leak
  
  DD howto
4. aries1988 26 Nov 2021
  
  in Public
  
  You'll use a (70%, 20%, 10%) split for the training, validation, and test sets. Note the data is not being randomly shuffled before splitting. This is for two reasons: It ensures that chopping the data into windows of consecutive samples is still possible. It ensures that the validation/test results are more realistic, being evaluated on the data collected after the model was trained.
  
  Train, Validation, Test: 0.7, 0.2, 0.1
  
  DD ts ml
5. aries1988 26 Nov 2021
  
  in Public
  
  Similarly, the Date Time column is very useful, but not in this string form. Start by converting it to seconds:
  
  timestamp_s = date_time.map(pd.Timestamp.timestamp)
  
  and then create "Time of day" and "Time of year" signals:
  
  day = 24*60*60 year = (365.2425)*day df['Day sin'] = np.sin(timestamp_s * (2 * np.pi / day)) df['Day cos'] = np.cos(timestamp_s * (2 * np.pi / day)) df['Year sin'] = np.sin(timestamp_s * (2 * np.pi / year)) df['Year cos'] = np.cos(timestamp_s * (2 * np.pi / year))
  
  howto time GP
6. aries1988 26 Nov 2021
  
  in Public
  
  The last column of the data, wd (deg)—gives the wind direction in units of degrees. Angles do not make good model inputs: 360° and 0° should be close to each other and wrap around smoothly. Direction shouldn't matter if the wind is not blowing.
  
  transform WD and WS into (u, v)
  
  GP FE
7. aries1988 26 Nov 2021
  
  in Public
  
  One thing that should stand out is the min value of the wind velocity (wv (m/s)) and the maximum value (max. wv (m/s)) columns. This -9999 is likely erroneous.
  
  EX eda
8. aries1988 26 Nov 2021
  
  in Public
  
  This tutorial uses a weather time series dataset recorded by the Max Planck Institute for Biogeochemistry.
  
  data meteo
9. aries1988 26 Nov 2021
  
  in Public
  
  date_time = pd.to_datetime(df.pop('Date Time'), format='%d.%m.%Y %H:%M:%S')
  
  pandas time
10. aries1988 26 Nov 2021
  
  in Public
  
  df.describe().transpose()
  
  GP pandas eda
Visit annotations in context

Tags

pandas

DD

eda

ts

data

time

meteo

ml

DF

EX

GP

FE

howto

moi

Annotators

aries1988

URL

tensorflow.org/guide/data
Sep 2021
scikit-learn.org scikit-learn.org

1.11. Ensemble methods — scikit-learn 0.18 documentation

1
1. aries1988 26 Sep 2021
  
  in Public
  
  1.11.6. Voting Classifier¶
  
  Kaggle example here https://www.kaggle.com/martynovandrey/one-model-voting-from-0-81800-to-0-81837/notebook
  
  FRD KWD
Visit annotations in context

Tags

FRD

KWD

Annotators

aries1988

URL

scikit-learn.org/stable/modules/ensemble.html
scikit-learn.org scikit-learn.org

sklearn.linear_model.SGDClassifier — scikit-learn 0.17.1 documentation

1
1. aries1988 08 Sep 2021
  
  in Public
  
  max_iterint, default=1000The maximum number of passes over the training data (aka epochs).
  
  DF KWD NB MOI
Visit annotations in context

Tags

NB

KWD

MOI

DF

Annotators

aries1988

URL

scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html
Aug 2021
r4ds.had.co.nz r4ds.had.co.nz

R for Data Science

5
1. aries1988 21 Aug 2021
  
  in Public
  
  It’s common to think about modelling as a tool for hypothesis confirmation, and visualisation as a tool for hypothesis generation. But that’s a false dichotomy: models are often used for exploration, and with a little care you can use visualisation for confirmation. The key difference is how often do you look at each observation: if you look only once, it’s confirmation; if you look more than once, it’s exploration.
  
  NB QOT FRD
2. aries1988 21 Aug 2021
  
  in Public
  
  It’s possible to divide data analysis into two camps: hypothesis generation and hypothesis confirmation (sometimes called confirmatory analysis). The focus of this book is unabashedly on hypothesis generation, or data exploration. Here you’ll look deeply at the data and, in combination with your subject knowledge, generate many interesting hypotheses to help explain why the data behaves the way it does. You evaluate the hypotheses informally, using your scepticism to challenge the data in multiple ways.
  
  DD CP
3. aries1988 21 Aug 2021
  
  in Public
  
  We think R is a great place to start your data science journey because it is an environment designed from the ground up to support data science. R is not just a programming language, but it is also an interactive environment for doing data science. To support interaction, R is a much more flexible language than many of its peers. This flexibility comes with its downsides, but the big upside is how easy it is to evolve tailored grammars for specific parts of the data science process. These mini languages help you think about problems as a data scientist, while supporting fluent interaction between your brain and the computer.
  
  NB DF R
4. aries1988 21 Aug 2021
  
  in Public
  
  If you’re routinely working with larger data (10-100 Gb, say), you should learn more about data.table. This book doesn’t teach data.table because it has a very concise interface which makes it harder to learn since it offers fewer linguistic cues. But if you’re working with large data, the performance payoff is worth the extra effort required to learn it.
  
  DF KWD FRD
5. aries1988 21 Aug 2021
  
  in Public
  
  Starting with data ingest and tidying is sub-optimal because 80% of the time it’s routine and boring, and the other 20% of the time it’s weird and frustrating. That’s a bad place to start learning a new subject! Instead, we’ll start with visualisation and transformation of data that’s already been imported and tidied. That way, when you ingest and tidy your own data, your motivation will stay high because you know the pain is worth it.
  
  GP
Visit annotations in context

Tags

QOT

DD

CP

R

FRD

KWD

DF

GP

NB

Annotators

aries1988

URL

r4ds.had.co.nz/introduction.html
Nov 2020
sspai.com sspai.com

开源、可定制的网页批注工具——Hypothesis - 少数派

1
1. aries1988 12 Nov 2020
  
  in Public
  
  添加标签（tag）
  
  测试tags
  
  newTag
Visit annotations in context

Tags

newTag

Annotators

aries1988

URL

sspai.com/post/63033
Sep 2020
index.pmthinking.com index.pmthinking.com

Product Thinking

1
1. aries1988 23 Sep 2020
  
  in Public
  
  PS is a training ground for identifying tacit knowledge. It starts off with the most basic form: recognizing something you know in the experience of another. Using resonance as your filter, you will often highlight things you “already know,” but never quite were able to express. Everything you read or watch becomes a mirror, prompting what you already know tacitly to emerge into consciousness as explicit knowledge, which you can then write down and make use of.
  
  NB #DEF
Visit annotations in context

Annotators

aries1988

URL

index.pmthinking.com/Progressive-Summarization-VI-Core-Principles-of-Knowledge-Capture-c560b4472bfe49748ac006ffea374458
nesslabs.com nesslabs.com

Getting compound interest on your thoughts with Conor White-Sullivan

1
1. aries1988 15 Sep 2020
  
  in Public
  
  So I think there’s definitely a lot of opportunity there for suggesting possibly related notes, but the act of a person seeing the connection themselves as opposed to some algorithm seeing the two words are connected, I think is pretty important, because that’s where you get more insight.
  
  GP
Visit annotations in context

Annotators

aries1988

URL

nesslabs.com/conor-white-sullivan-interview
sspai.com sspai.com

Matrix 圆桌 | 网状结构笔记工具是一阵风吗？ - 少数派

5
1. aries1988 05 Sep 2020
  
  in Public
  
  2. 写比存重要，它会引诱你思考；但写作也不仅仅是一种倾泻、也应该是一种筛选的过程。我们不是要再造一个「迷你互联网」，而是要提高内容的质量、精炼度与可利用度。我们要写的，正是别人难写之事。3. 构建内容的联系，而不只是[[关键词]]的链接。有意义的联系一定是人为「构建」的，不是工具「生成」的。在两条笔记之间画一条线本身一点意义也没有，这条线也并不会给你增加任何新的见识。真正的联系在笔记里，而不是笔记间的连线，那只是一个提醒。
  
  NB
2. aries1988 05 Sep 2020
  
  in Public
  
  @宽治：从知识管理的本质上来说，它们不能解决（也就是需要使用者自己通过思考来解决）的问题是：1. 平衡笔记的可理解性与可发现性之间的张力。2. 判断内容的价值并赋予相应的重要等级。3. 理解内容之间的联系并将之清楚地表达出来。4. 发现甚至预见内容拓展的可能性。
  
  https://fortelabs.co/blog/progressive-summarization-a-practical-technique-for-designing-discoverable-notes/
3. aries1988 05 Sep 2020
  
  in Public
  
  为了方便输出，我用 Keyboard Maestro 做了几个脚本，可以帮我一键把 Roam Research 中的内容输出成为 Textbundle、docx 或者 reveal.js 幻灯格式。这样一来，笔记整理和写作就都可以在 Roam Research 之内无障碍完成。
  
  FRD
4. aries1988 05 Sep 2020
  
  in Public
  
  「晨间记录」和「晚间思考」（Morning Journal & Evening Reflectin）这两个板块用于早晚的个人记录和总结。「输入（Input）」指我这一天做了什么、学了什么、了解了什么；「输出（Output）」则更关注产出，包括「地标（Landmark）」这样值得铭记的成就和阶段性成果；「个人观察记录」更多是跟我身心状态相关的记录。如果我工作在一个具体而较为宏观的任务上，我就会选择创建对应 Page 并跳转到其中去工作。等待任务完成再回到 Journal 中。此外，如果不生成新页面的话，我会尽量给某个记录添加相应的 Tag，以便索引。
  
  DEF
5. aries1988 05 Sep 2020
  
  in Public
  
  这么说可能有点抽象，举个具体的例子。原先我会分章节写树状的笔记，但后来发现有些书并不需要全读完，或者我会暂停一段时间接着读。在这种情况下，树状笔记往往会是未完成的状态（比如只有一个章节），看起来就很尴尬。而现在，我会：首先建立这本书的空白笔记。读到值得摘取的段落时新建时间戳笔记，起一个「章节-页码-内容概要」这种容易索引的标题，粘贴进去。另起一行，写摘录的原因和感想！
  
  DEF
Visit annotations in context

Annotators

aries1988

URL

sspai.com/post/61886
www.nytimes.com www.nytimes.com

What Kids Around the World Eat for Breakfast

1
1. aries1988 05 Sep 2020
  
  in Public
  
  Children begin to acquire a taste for pickled egg or fermented lentils early — in the womb, even. Compounds from the foods a pregnant woman eats travel through the amniotic fluid to her baby. After birth, babies prefer the foods they were exposed to in utero, a phenomenon scientists call “prenatal flavor learning.”
  
  [[Prenatal flavor learning]]
Visit annotations in context

Annotators

aries1988

URL

nytimes.com/interactive/2014/10/08/magazine/eaters-all-over.html

aries1988

Annotations: 27

Joined: September 5, 2020

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

NB #DEF

Annotators

URL

GP

Annotators

URL

NB

FRD

DEF

DEF

Annotators

URL

Annotators

URL